Automatic Generation of comics with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

In recent years, there has been exponential growth in the field of computer science. These constant updating developments have made things easier and better with the reduction of manual work. Creating innovative data and work consumes more time through the manual process. Computational techniques have helped generate texts, images, and other types of valuable data in less time. 

In this article, we will be talking about the recent creation in the field, which is a framework that can automatically generate comic books.


According to the recent reports, the researchers at Dalian University of technology and City University of Hong Kong in China have created a fully automatic system designed to generate manga comic books. This framework will produce comic books from videos without any human interference. 

Above is the overall pipeline of the researcher’s system. (a) Keyframe Extraction and Stylization. (b) Automatic Multi-Page Layout Framework (the red, purple, and green dotted boxes mean different groups). (c) Balloon Generation and Placement. Credit: Yang et al.

The system is designed in a way that it first extracts the informative keyframes from the input video having subtitles. The system then interprets the keyframes into comic-style images by analyzing the subtitles. The images are distributed across multiple pages using a multi-page layout framework developed by the researchers. The multi-page framework also creates visually-gripping layouts based on the characteristics of the images.

Instead of using the same balloon type, like most previous works, the researchers put forward an emotion-aware balloon generation method. This method creates different types of balloons by analyzing the emotions conveyed in the audio and subtitles of the video. This method creates a great reading experience as it creates varying balloon shapes and word sizes in the balloons using the different analyzed emotions.

The balloons generated by the system are placed next to their corresponding speaker. The balloon placement is done by detecting every speaker in the video and aligning each speech balloon to their related speakers having proximity in emotions.

The methodology of the framework:

The main idea of the framework is to design a system that is completely automatic and does not require any manual inputs. The system has three modules, with selection and stylization being the first module. In the second module, the multi-page layout framework is created, and finally, in the third module, the speech balloons are generated and placed accordingly.

Pipeline of Keyframe Selection.
Credit: Yang et al.

Let us discuss each of these modules in a detailed manner.

  1. Extraction of keyframes and stylization module: The system is provided with a video as input which has subtitles containing dialogues and related information about start and ends time. Firstly, one frame is selected from every 0.5 seconds of the actual video, and information keyframes are selected.

Keyframe selection. 

The system uses time information to select keyframes (fig 2). In the first step, the researchers segmented the source video into shots, by observing the start and end times of each subtitle. The keyframes are selected based on elements like the type of frame and the similarity between the frames.


The pipeline of Stylization. First, we convert the source image to a black-white image via the extended DoG [26] and later get the quantized image by color quantization. Finally, we combine these two kinds of the image to get a colored stylization. Credit: Yang et al.

Here, the source image is converted to a black-white image using the Difference-of-Gaussians (DoG) approach. To have a colored image 128 level color quantization is performed, and DoG edges are obtained. Finally, the DoG image and the quantized image are combined to get a colored image.

  • Multi-page Layout Module.

With the help of a multi-page layout framework, the panels are spread across multiple pages in rich layouts. 

The four key factors leading multi-page layouts generation are:

  1. Region of interest (ROI) of keyframes
  2. The important rank of keyframes
  3. Semantic relation between keyframes
  4. Number of panels on the page

Panel layout: After calculating these four factors, the researchers then input them into an existing panel-layout method. The layout provides different styles acquired from various Manga series.

  • Text Balloon Generation and Placement Module.

Balloon shape selection: In this system, the researchers used three common balloon shapes. The researchers also developed an emotion-aware balloon generation method that generates speech balloons of multiple shapes using the video’s audio and subtitles.

Process of extracting the region of interest (ROI). Different heatmaps by CAM processing obtained from original keyframes. Then, we get a gray-scale image of the heatmap by stacking this heatmap and gray processing. Finally, we get ROI from a gray-scale image of the heatmap.

As shown in the above figure, the subtitle emotions and the corresponding audios are obtained from a video segment using emotion analysis. The word size in a balloon is determined using emotion analysis, and then the shape of the word balloon is selected.

Limitations of the system:

  1. The keyframe selection is not accurate enough. There are chances that there is a similarity between the selected keywords creating difficulties in generating comics.
  2. It consumes a large amount of time to generate comics out of the video without subtitles. 


The researchers at the Dalian University of technology and City University of Hong Kong in China have developed a system that can generate high-quality comics without human interventions. The important tools of the system are the multi-page layout framework that spreads the images across several pages and an emotion-aware balloon generation method that creates a wide range of balloon shapes and font sizes considering the emotions of the audio and the subtitles. The system developed by the researchers is capable of producing more interesting, expressive, and engaging comics. With the developed system having uses supported by experimental results, it has a few limitations as well. 




Share on facebook
Share on twitter
Share on linkedin

Leave a Reply

Your email address will not be published.