Not more than a week after Meta announced its text-to-video generator Make-A-Video, Google announced Imagen Video, the company’s own AI-powered text-to-video generator. Google and Meta’s text-to-video generators aren’t the only ones available, suggesting that the development of AI in this area is growing very rapidly.
Google’s Imagen Video has the ability to create 1280×768 resolution videos at 24 frames per second and the final output is 5.3 seconds long. According to the research paper of Imagen Video, this model has been trained on a combination of an “internal dataset consisting of 14 million video-text pairs and 60 million image-text pairs, and the publicly available LAION-400M image-text dataset.” The company also claims that the model is capable of producing videos with high fidelity, a high degree of controllability, and world knowledge. It also found that the model can understand 3D objects and it can produce videos that are based on the work of famous artists like Vincent Van Gogh. This AI model was announced just five months after Google came up with its text-to-image version.
It also found that the model can understand 3D objects and it can produce videos that are based on the work of famous artists like Vincent Van Gogh.
Basically, the model will take the text description and produce a 16-frame, 3-frames per second video at 24×48 pixel resolution video. This will then be upscaled into generating the final video. This is how the research team at Google has described the working of Imagen Video.
The model is still in the research phase and the results displayed by Google look promising. But it should be noted that all these examples have been handpicked by Google as it is not open for everyone to try it.
Yet again, Google has also voiced its concern over “problematic data” that has been used to train this text-to-video generator. Although the company has tried its best to filter out harmful content, it is concerned that the tool can be used to generate “to generate, fake, hateful, explicit, or harmful content.” This is an issue that most text-to-image and text-to-video generators have faced or will face (once it is released to the public). Text-to-image generators are already being used to create pornographic images and people have also identified the societal biases that the generators have picked up.
The competition among similar AI models is going up. So, even though the AI world is witnessing rapid development, there are still a few checkboxes left unticked until some of these tools are released to the world. Google said, “We have decided not to release the Imagen Video model or its source code until these concerns are mitigated.”
Featured Image Credits: Google