Facebook’s parent company Meta introduces Make-A-Video, an AI content creator like DALL-E with the same bugs

Over the summer, the art world was rocked after an AI-created “painting” won a state art competition. Although image generators have been around for decades, they’ve mostly been treated as novelty – a fun quirk of the internet for people to have fun and play around with. But the AI ​​art that won the competition transcended a sort of previously unseen Rubicon. When this happened, it made something very clear to the entire world: the AI ​​art wars have begun.

Between bots like Midjourney and DALL-E – not to mention Google’s Imagen – developers and engineers are stepping up their efforts to create the best and most sophisticated text-to-image generators.

Now, Meta (Facebook’s parent company) has thrown its hat in the ring — but instead of a text-to-image bot that turns a text prompt into an image, they’ve developed one that can transform your words videos.

The company announced in a blog post last week that it has developed a text-to-video generator called Make-A-Video. The AI-driven system appears to be Meta’s answer to DALL-E and Imagen – only instead of simply taking a still image, it spits out full-blown videos.

Prompt: A dog in a superhero outfit with a red cape flies through the sky.

Meta

“Generative AI research is driving creative expression by giving people tools to create new content quickly and easily,” Meta announced in its announcement. “With just a few words or lines of text, Make-A-Video can bring the imagination to life and create unique videos full of vibrant colors, characters and landscapes.”

In addition to text input, the bot “can also create videos from images, or take existing videos and create new ones similar to them,” the company added. This means you can input existing images and videos and spit out a new video based on the image.

While it’s not the first advanced text-to-image generator (researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence unveiled a similar text-to-video generator called CogVideo in May), Make-A -Video notable when it comes to how it was trained – which is a major hurdle to overcome when it comes to bots making videos.

That’s because a typical text-to-image generator is usually trained on huge datasets consisting of billions of images paired with alt text descriptions of the image. These text-image pairs allow the AI ​​to learn which image to form when you enter text.

blank

Prompt: An artist paints on a canvas up close.

Meta

However, in a paper published by Meta’s AI researchers on September 29, the authors wrote that video generators did not have this benefit. Such programs can only access data sets with a few million videos with the associated subtitles (this is how CogVideo was trained).

To get around this and leverage existing image datasets, the meta-researchers used text-to-image models to train their bot to recognize the connection “between text and the visual world.” Then they trained Make-A-Video on video datasets to teach it realistic movements.

So if you type something like “a horse drinking water” it could use its image training to figure out what a horse drinking water looks like. Then it would draw on its video training to understand that large four-legged creatures near water troughs would typically lick up the water with their mouths — and then make a video of it doing so.

blank

Prompt: Horse drinks water.

Meta

The results are amazing – albeit with some clear caveats. While the generator can occasionally create realistic and stunning videos, they’re not perfect. The themes of some images can seem just as malformed and repulsive as the most rudimentary text-to-image generators. The videos aren’t long either – less than five seconds – so not feature-length films yet. There is no sound either. But it’s clear that Make-A-Video is nonetheless a huge, transformative leap forward for AI and AI-based art.

However, it’s the same data set that created these videos that will likely have a familiar thorn in the side of so many AI bots before: bias. After all, these generators are only as good as the data used to train them. If you train it on biased data, you will get biased results.

There’s also no end to the real-world examples of the damage that biased AI has caused – from racist chatbots to mortgage loan algorithms turning down black and brown applicants to sophisticated image generators like DALL-E mimicking our biased tendencies.

blank

Challenge: A ballerina performs a beautiful and difficult dance on the roof of a very tall skyscraper.

Meta

Meta’s Make-A-Video is no exception. Although the paper’s authors note that they attempted to filter out NSFW content and toxic language from their datasets, they acknowledge that “our models learned about and likely exaggerated social biases, including harmful ones.” However, they add that all of their data is publicly available to add a “layer of transparency” to the models.

So, the bias we saw with text-to-image generators still exists—only now they appear with full videos. In the age of deepfake videos and the spread of rampant misinformation, that’s a frightening prospect. Although Make-A-Video still only produces rudimentary videos, it’s easy to imagine that the technology will inevitably be refined to the point where the videos become indistinguishable from reality.

However, what Meta is doing shouldn’t surprise anyone who has seen the rise of bot-generated images in recent years. And it’s no coincidence that these companies are starting to invest more and more in these systems. With the exponential explosion in the research and development of these computer models, we’ve seen text-to-image generators morph from malformed squint-and-you-see-it images into full-blown works of art that leapfrog that Realm of the uncanny straight into hyperrealism.

blank

Prompt: A young couple walking in a heavy rain.

Meta

This tide of newer, more sophisticated bot-made artwork is fueling a growing arms race in AI image generation — and it’s one that’s sharply intensifying. It’s no coincidence that the company formerly known as Facebook decided to announce the bot on the same day that Google announced its DreamFusion text-to-3D image generator.

While many are outraged by AI art winning competitions and being used for things like book covers, this is just the beginning. If Make-A-Video lives up to its promise, we’re not far from seeing feature-length movies created entirely from AI, complete with sound, characters, and a story. That’s no exaggeration. It’s fate. This knowledge cannot be put in a drawer again.

In other words, we are far from telling stories and building shadow figures around a fire in a cave. Now it’s only a matter of time before the fire we’ve created starts casting its own shadows on the cave walls.

https://www.thedailybeast.com/facebooks-parent-company-meta-unveils-make-a-video-an-ai-content-creator-like-dall-e-with-the-same-flaws?source=articles&via=rss Facebook’s parent company Meta introduces Make-A-Video, an AI content creator like DALL-E with the same bugs

Hung

Hung is a Interreviewed U.S. News Reporter based in London. His focus is on U.S. politics and the environment. He has covered climate change extensively, as well as healthcare and crime. Hung joined Interreviewed in 2023 from the Daily Express and previously worked for Chemist and Druggist and the Jewish Chronicle. He is a graduate of Cambridge University. Languages: English. You can get in touch with me by emailing: hung@interreviewed.com.

Related Articles

Back to top button