Just blink your eye once and you will find that technology has bolted way ahead. That’s how fast the world of artificial intelligence is developing. Not very long ago people were talking about AI text-to-image generators and how mindblowing their image generations are. Meta just jumped the race with Make-A-Video, a text-to-video generator.
Similar to text-to-image generators like DALL-E and Stable Diffusion, users simply need to input a text prompt and the AI model generates a that depicts the text description. Although these videos appear a little distorted and blurry, they give us an idea about the pace at which these AI offerings are evolving, and it’s remarkable. Meta said in a blog post, “Generative AI research is pushing creative expression forward by giving people tools to quickly and easily create new content.”
“Generative AI research is pushing creative expression forward by giving people tools to quickly and easily create new content.”
These videos are five seconds long and also don’t have audio. Currently, Meta is not giving access to its text-to-video generator to anyone and as of now, it has only published a paper about it. The current output of this tool is 16 frames of video with 64×64 pixels resolution. This output is then enlarged in size to 768×768 pixels by using a different AI model.
Meta said that it is being, “thoughtful about how we build new generative AI systems like this”. This could be because like text-to-image generators, this model has been trained on content from HD-VILA-100M and WebVid-10M, which have millions and millions of videos and footage. So, one wrong text prompt can allow a human to create a video that could cause harm to society. Creating pornographic snippets could be one of them. With this, this tool may have also picked up biases that prevail in our society. So, the output could create more problems.
“It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time”, mentioned Mark Zuckerberg, the CEO of Meta in a Facebook Post. In the research paper released by Meta, researchers have pointed out some of the limitations that the model has like generating videos containing more than one scene, generating images longer than five seconds, and more.
“It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time”-Mark Zuckerberg, CEO of Meta
We would be aware of the potential as well as the harms of this tool once it is available to everyone. The company also shared that this generative AI research is being circulated in the community to improve and evolve. Maybe we would also create entire movies just using text descriptions someday but we still have a long way to go.
Featured Image Credits: Meta