A donkey on the back of a horse or a dog driving a car, these things may not be feasible in real life but you can most probably get a picture of them doing so. Can you Imagen?
No, I have not spelled imagine incorrectly, that’s what Google’s new AI is called. How it works is you can feed the AI any text you want, the sky is the limit. It then processes the information to give you eerily accurate pictures of the same. Google Brain team’s Imagen will generate an image according to what you have typed, based on its understanding after reading a lot of data and picking up the keywords.
Imagen combines large transformer language models in understanding text and diffusion models to create high-quality images. Google talks about how ‘Imagen features an unprecedented degree of photorealism and a deep level of language understanding that surpasses its competitors’. Google has displayed quite a few of these AI-generated pictures on the Imagen website. There is a transparent sculpture of a duck made out of glass which looks as if the text was made for the picture and not vice-versa, there’s even a teddy bear swimming at the Olympics. The photorealism and accuracy in these pictures are what’s so astounding. The company also talks about a key discovery they made as to ‘how generic large language models are surprisingly effective at encoding text for image synthesis’.
There has been some skepticism towards this product and rightfully so as when a company comes up with a new AI model they tend to only showcase a hand few of their best results. A lot of times such images are either too blurry or they have messed up the concept we wanted, a problem which is faced by other companies such as OpenAI’s Dall-E program. Google has also come out with its own text-to-image benchmark tool called DrawBench. When Imagen was compared to other popular AI methods such as V-GAN+CLIP, Latent Diffusion Models, and more, it showed that human raters score Imagen over other methods in terms of both sample quality and image-text alignment.
We will not be able to test out Imagen for ourselves and for a valid reason. There are several ethical hurdles that come with text-to-image research broadly speaking. Whenever you type something in Imagen it surfs the internet for information to learn and create images. Because the internet is filled with biases and preferences, that is what the machine learns and can potentially show. The outcomes are often sexist and racist in some way or the other and Google has also talked about how the machine picks up on preferences toward lighter skin tones and Western gender stereotypes. All this combined can make it very convenient for people to misuse the service and Google doesn’t want that.
So for now Google is going to take its time in making the AI reliable enough to launch as it’s working on developing it more. This is surely an exciting step toward a future where we can probably use these services to goof around and do other things, but for now, we have to make do with the amazing pictures Imagen has already released on its website.