ChatGPT's new Images 2.0 model is surprisingly good at generating text
As artificial intelligence continues to evolve, the capabilities of AI models are becoming increasingly sophisticated. One of the latest advancements comes from OpenAI with the release of the ChatGPT Images 2.0 model. This new model not only excels at generating images but has also shown surprising proficiency in generating text, marking a significant improvement in AI technology.
The Evolution of AI Image Generation
Historically, AI image generators faced challenges in producing coherent text within images. Just two years ago, models struggled to create even simple menus without inventing nonsensical dishes. For instance, a Mexican restaurant menu generated by earlier models included items like “enchuita,” “churiros,” and “burrto.” However, the new ChatGPT Images 2.0 model has drastically improved this aspect, producing menus that could seamlessly fit into a restaurant setting without raising eyebrows.
Comparison with Previous Models
To illustrate the advancements, consider the output from DALL-E 3, an earlier AI image generator. At the time, ChatGPT did not have the capability to generate images. The results from DALL-E 3 were often quirky and lacked the finesse required for realistic applications. In contrast, the Images 2.0 model can create text that appears polished and professional, significantly reducing the gap between human and AI-generated content.
Understanding the Technology Behind Images 2.0
AI image generation has traditionally relied on diffusion models, which reconstruct images from noise. Asmelash Teka Hadgu, founder and CEO of Lesan AI, explained that these models often overlook smaller text elements since they focus on broader patterns in the image. Researchers have since explored alternative methods, such as autoregressive models, which predict what an image should look like, functioning similarly to large language models (LLMs).
While OpenAI has not disclosed the specific model powering ChatGPT Images 2.0, they have highlighted its “thinking capabilities.” These capabilities allow the model to:
- Search the web for information.
- Create multiple images from a single prompt.
- Double-check its creations for accuracy.
Such features enable the model to generate marketing assets in various sizes and even multi-paneled comic strips, showcasing its versatility.
Enhanced Language Capabilities
Another notable improvement in Images 2.0 is its enhanced understanding of non-Latin text rendering. The model can now effectively handle languages such as Japanese, Korean, Hindi, and Bengali. This broadens its usability and makes it a valuable tool for a more diverse audience.
Performance and Output Quality
OpenAI claims that Images 2.0 brings an unprecedented level of specificity and fidelity to image creation. The model can conceptualize complex images and execute them with precision, maintaining requested details and rendering intricate elements that often challenge previous models. This includes:
- Small text.
- Iconography.
- UI elements.
- Dense compositions.
- Subtle stylistic constraints.
All of this is achievable at resolutions of up to 2K, which is a significant upgrade over earlier models.
Access and User Experience
Starting Tuesday, all ChatGPT and Codex users will have access to Images 2.0. However, paid users will benefit from the ability to generate more advanced outputs. OpenAI is also introducing the gpt-image-2 API, with pricing based on the quality and resolution of the outputs, making it accessible for various applications.
Conclusion
The ChatGPT Images 2.0 model represents a significant step forward in the capabilities of AI, particularly in generating coherent text alongside images. As technology continues to advance, the potential applications for such models are vast, ranging from marketing to creative storytelling. The improvements in language handling and image fidelity make it a promising tool for both professionals and casual users alike.
Note: The information in this article is based on the latest updates from OpenAI and reflects the state of AI technology as of April 2026.

