The race for high-quality AI-generated videos is heating up.
On Monday, Runway, a company developing generative AI tools for film and image content creators, unveiled Gen-3 Alpha. The company’s latest AI model generates video clips from text descriptions and still images. Runway claims the model offers a “major” improvement in generation speed and fidelity over Runway’s previous flagship video model, Gen-2, as well as fine-grained controls over the structure, style and movement of videos that he creates.
Gen-3 will be available in the coming days to Runway subscribers, including enterprise customers and creators in Runway’s Creative Partner Program.
“Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures and emotions,” Runway wrote in a blog post. “It was designed to interpret a wide range of cinematic styles and terminologies (and enable) imaginative transitions and precise framing of scene elements.”
Gen-3 Alpha has its limitations, including the fact that its frames are no longer than 10 seconds. However, Runway co-founder Anastasis Germanidis promises that Gen-3 is just the first – and smallest – of several upcoming video generation models in a family of next-generation models trained on improved infrastructure .
“The model can struggle with complex interactions between characters and objects, and generations don’t always follow the laws of physics precisely,” Germanidis told TechCrunch this morning in an interview. “This initial deployment will support high-resolution 5- and 10-second builds, with generation times significantly faster than Generation 2. A 5-second clip takes 45 seconds to generate, and a 10-second clip takes 90 seconds to generate.
Gen-3 Alpha, like all video generation models, was trained on a large number of example videos – and images – so that it could “learn” the models from those examples to generate new clips. Where does the training data come from? Runway wouldn’t say it. Few generative AI vendors offer such insights these days, in part because they view training data as a competitive advantage and therefore keep it and related insights close to their chest.
“We have an internal research team that oversees all of our training and we use selected internal datasets to train our models,” Germanidis said. He left it there.
Training data details are also a potential source of intellectual property-related lawsuits if the vendor trained on public data, including copyrighted data from the web – and thus another incentive to reveal a lot. Several court cases reject vendors’ fair use defenses of training data, arguing that generative AI tools reproduce artists’ styles without the artists’ permission and allow users to generate new works resembling the artists’ originals for which the artists receive no payment.
Runway addressed the copyright issue somewhat, saying it consulted with artists to develop the design. (Which artists? Not clear.) This mirrors what Germanidis told me during a fireside party at TechCrunch’s Disrupt conference in 2023:
“We are working closely with artists to determine what the best approaches are to solve this problem,” he said. “We are exploring various data partnerships so we can continue to grow…and build the next generation of models.”
Runway also says it plans to release Gen-3 with a new set of safeguards, including a moderation system to block attempts to generate videos from images and copyrighted content that are not not in compliance with Runway’s terms of service. A provenance system – compatible with the C2PA standard, supported by Microsoft, Adobe, OpenAI and others – is also in the works to identify that videos are from Gen-3.
“Our new, improved internal visual and text moderation system uses automatic monitoring to filter inappropriate or harmful content,” Germanidis said. “C2PA authentication verifies the provenance and authenticity of media created with all Gen-3 models. As model capabilities and the ability to generate high-fidelity content increase, we will continue to invest significantly in our alignment and security efforts.
Runway also revealed that it has partnered and collaborated with “leading entertainment and media organizations” to create custom versions of Gen-3 that allow for more “stylistically controlled” and consistent characters, targeting “specific artistic and narrative requirements”. The company adds: “This means that generated characters, backgrounds and assets can maintain a consistent appearance and behavior in different scenes. »
A major unresolved issue with video generation models is control, that is, ensuring that a model generates a consistent video aligned with a creator’s artistic intentions. As my colleague Devin Coldewey recently wrote, simple questions in traditional cinema, like choosing a color in a character’s clothing, require workarounds with generative models because each shot is created independently others. Sometimes even workarounds aren’t enough, leaving considerable manual work for editors.
Runway has raised more than $236.5 million from investors including Google (with whom it has cloud computing credits) and Nvidia, as well as venture capital firms including Amplify Partners, Felicis and Coatue. The company has aligned itself closely with the creative industry as its investments in generative AI technology increase. Runway operates Runway Studios, an entertainment division that serves as a production partner for corporate clientele, and hosts the AI Film Festival, one of the premier events dedicated to showcasing fully – or partially – produced films. by AI.
But competition is getting fiercer and fiercer.
Generative AI startup Luma last week announced Dream Machine, a video generator that went viral for its ability to animate memes. And just a few months ago, Adobe revealed that it was developing its own video generation model based on content from its Adobe Stock media library.
Elsewhere, there are incumbents like OpenAI’s Sora, which remains tightly closed but which OpenAI has seeded to independent and Hollywood marketing agencies and filmmakers. (OpenAI CTO Mira Murati was in attendance at the 2024 Cannes Film Festival.) This year’s Tribeca Festival, which also has a partnership with Runway to curate films made using AI tools, featured short films produced with Sora by directors benefiting from early access.
Google has also given its image generation model, Veo, to selected creators, including Donald Glover (aka Childish Gambino) and his creative agency Gilga, with the aim of integrating Veo into products such as YouTube Shorts.
Regardless of the various collaborations, one thing is becoming clear: Generative AI video tools threaten to disrupt the film and television industry as we know it.
Filmmaker Tyler Perry recently said he put a planned $800 million expansion of his production studio on hold after seeing what Sora could do. Joe Russo, the director of Marvel’s flagship films like “Avengers: Endgame,” predicts that within a year, AI will be able to create a full-fledged film.
A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, found that 75% of film production companies that adopted AI reduced, consolidated or eliminated jobs after incorporating it. technology. The study also estimates that by 2026, more than 100,000 U.S. entertainment jobs will be disrupted by generative AI.
It will take very strong labor protections to ensure that video generation tools do not follow in the footsteps of other generative AI technologies and lead to a sharp decline in demand for creative work.