We know that Artificial Intelligence is nowadays very mainstream, thanks to assistants like ChatGPT and tools like Midjourney, but Nvidia is up to something very cool too, that could end up taking our GIF experience to a whole new level. The company’s Toronto AI Lab developed what they call “Latent Diffusion Models” (LDMs); they are able to generate videos without the need for large amounts of computing power. These models can be considered a type of AI that is built on top of text-to-image generators, using Stable Diffusion to add a temporal dimension to the latent space diffusion model. The technology is capable of producing usable results from simple prompts such as “a stormtrooper vacuuming on the beach” or “a teddy-bear playing the electric guitar, high definition, 4K”.
While text-to-video tech like Nvidia’s demos are most suitable for creating thumbnails and GIFs, the quick advancements seen in Nvidia’s AI generation for longer scenes suggest we won’t have to wait long for longer text-to-video clips to be created. Also, Nvidia is not the only company to showcase text-to-video generators; Google Phenaki has already made its debut, revealing the potential for 20-second clips based on longer prompts, as well as a two-minute clip.
Runway, the company that helped create Stable Diffusion, has also revealed its Gen-2 AI video model; Adobe Firefly’s recent demos also show how much easier AI will make video editing, allowing users to enter the time of day or season they want to see in their video, with Adobe’s AI doing the rest.
While full text-to-video generation is still in a “nebulous” state (often creating warped or dreamy results) recent advancements suggest that improvements making the tech suitable for longer videos are just around the corner. We are excited to see what the future holds! What about you?
Filed in AI (Artificial Intelligence) and NVIDIA.
. Read more about