Object-driven Text-to-Image Synthesis via Adversarial Training

 

The notes of Object-driven Text-to-Image Synthesis via Adversarial Training

Overview

This paper proposed an obj-GAN which has an object-driven attentive image generator and a new Fast R-CNN based object-wise discriminator. The object-driven generator is able to pay attention to the most relevant words and semantic layout to synthesize salient objects. The object-wise discriminator is to ensure the synthesized object matches the text description and the pre-generated layout.

Compared to the previous work, which do not specifically model the relations between objects, and thus struggle to generate complex scenes except for simple datasets, such as birds and flowers, this paper first constructs a semantic layout from the text and then synthesze the image by a deconvolutional generator.