Have you ever tried writing captions for news articles, and it seems that they lack something? Did you know that a captioning system could be the solution to that concern?

Three researchers from the Australian National University, namely Lexing Xie, Alasdair Tran, and Alexander Mathews, thought the same things about today’s typical computer systems. They felt that the captions made by these systems appear too general and uninteresting.

Given this situation, they decided to develop new caption-generating systems that are descriptive and sophisticated without losing its related content.

Content-related Captions: Model of Sophistication and Emotion

Content-related Captions- Model of Sophistication and Emotion - THESIS.PH

What’s interesting about this system is that it automatically captions news images without losing focus on the context. It gives more than evident and uninteresting visual details for a picture. This system broadens the limits placed by previous systems, by making captions more sentimental and romantic. We can then relate this to the fact that every picture always has a story to tell; captions should convey images as something unique and personal to us.

The researchers then built a model where captions consider both the image and the article content to provide interesting information. In the previous systems, usually images are treated as an isolated object in captions, and articles suffice more details about the subject in the image.

Tran and his team developed the first end-to-end system where captions could generate real-world knowledge, like people and places’ names. Though it appears complex, the advantage of using this system is its simplicity.

Content-related Captions- Model of Sophistication and Emotion 2 - THESIS.PH

Developing Captions: Concepts Generated to the System’s Improvement

The following are related concepts that led to the development of the automatic captioning system featured in a pre-published paper by arXiv:

  • Byte Pair Coding

Since previous captioning systems have limited vocabulary size, this technique breaks down words into many subparts that frequently occur, like ‘ing’.

  • Modules

To enhance the accuracy of their model’s capacity to identify people’s names, and the manner of reporting them in the captions it produced, the two modules focused on detecting faces and objects.

  • Artificial Intelligence Research

Tran mentioned that their essential goals are to get a machine to operate and think as humans do. Using this research, they get closer to their goals.

  • “TRANSFORM AND TELL”

Developing Captions- Concepts Generated to the System's Improvement - Transform and Tell - THESIS.PH

The image above is the demo of the captioning system devised by the three researchers. Once its full version is released, it would help media specialists create news images’ captions more efficiently.

Model’s Applications and Directions for the Future

Despite its current limitations, the researchers are currently testing the model’s capacity to handle slightly different tasks, including:

  • selecting an image that would suit the article based on its content
  • identifying the best place to put an image
  • applying their model in different domains, like summarizing background knowledge
  • analyzing new arXiv papers
  • suggesting interesting content for scientific news releases

As of now, the model is only applicable to the current article. However, future research hopes to provide an opportunity for the system and its model to be practically used in more of our other article-related needs. The system’s goals are indeed fit to connect people with their works.

This captioning system will surely be useful for my digital marketing firm, iPresence Digital Marketing, Inc., since our flagship service is content writing.

Do you consider innovating this captioning system? Let me know your thoughts in the comments.

LEAVE A REPLY

Please enter your comment!
Please enter your name here