Imagen 2
Our most advanced text-to-image technology
Improved image-caption understanding
Text-to-image models learn to generate images that match a user’s prompt from details in their training datasets’ images and captions. But the quality of detail and accuracy in these pairings can vary widely for each image and caption. To help create higher-quality and more accurate images that better align to a user’s prompt, we added further descriptions to image captions in Imagen 2’s training dataset, helping Imagen 2 learn different captioning styles and generalize to better understand a broad range of user prompts. These enhanced image-caption pairings help Imagen 2 better understand the relationship between images and words — increasing its understanding of context and nuance.
More realistic image generation
Imagen 2’s dataset and model advances have delivered improvements in many of the areas that text-to-image tools often struggle with, including rendering realistic hands and human faces and minimizing distracting visual artifacts. We trained a specialized image aesthetics model based on human preferences for qualities like good lighting, framing, exposure, sharpness, and more. Each image was given an aesthetics score, which helped condition Imagen 2 to give more weight to images in its training dataset that align with qualities humans prefer. This technique improves Imagen 2’s ability to generate higher-quality images.
Fluid style conditioning
Imagen 2’s diffusion-based techniques provide a high degree of flexibility, making it easier to control and adjust the style of an image. By providing reference style images in combination with a text prompt, we can condition Imagen 2 to generate new imagery that follows the same style.
Editing capabilities
Imagen 2 also enables image editing capabilities like ‘inpainting’ and ‘outpainting’. By providing a reference image and an image mask, users can generate new content directly into the original image with a technique called inpainting, or extend the original image beyond its borders with outpainting. These capabilities are available in Google Cloud’s Vertex AI, together with an expanded list of aspect ratio options: 16:9, 9:16, 4:3 and 3:4.
Responsible by design
To help mitigate the potential risks and challenges of our text-to-image generative technology, we set robust guardrails in place, from design and development to deployment in our products. Imagen 2 is integrated with SynthID, our cutting-edge toolkit for watermarking and identifying AI-generated content, enabling allowlisted Google Cloud customers to add an imperceptible digital watermark directly into the pixels of the image, without compromising image quality. This allows the watermark to remain detectable by SynthID, even after applying modifications like filters, cropping, or saving with lossy compression schemes. Before we release capabilities to users, we conduct robust safety testing to minimize the risk of harm. From the outset, we invested in training data safety for Imagen 2, and added technical guardrails to limit problematic outputs like violent, offensive, or sexually explicit content. We apply safety checks to training data, input prompts, and system-generated outputs at generation time. For example, we’re applying comprehensive safety filters to avoid generating potentially problematic content, such as images of named individuals. As we are expanding the capabilities and launches of Imagen 2, we are also continuously evaluating them for safety.