Nvidia’s text-to-video tech will take your GIF game to the next level

Now that ChatGPT and Midjourney are pretty much mainstream, the next big AI race is text-to-video generators – and Nvidia has just shown off some impressive demos of the tech that could soon take your GIFs to a new level.

A new research paper and micro-site (opens in new tab) from Nvidia's Toronto AI Lab, called “High-Resolution Video Synthesis with Latent Diffusion Models”, gives us a taste of the incredible video creation tools that are about to join the ever-growing list of the best AI art generators.

Latent Diffusion Models (or LDMs) are a type of AI that can generate videos without needing massive computing power. Nvidia says its tech does this by building on the work of text-to-image generators, in this case Stable Diffusion, and adding a “temporal dimension to the latent space diffusion model”.

(Image credit: Nvidia)

In other words, its generative AI can make still images move in a realistic way and upscale them to using super-resolution techniques. This means it can produce short, 4.7-second long videos with a resolution of 1280×2048, or longer ones at the lower resolution of 512×1024 for driving videos.

Our immediate thought on seeing the early demos (like the ones above and below) is how much this could boost our GIF game. Okay, there are bigger ramifications, like the democratization of video creation and the prospect of automated film adaptations, but at this stage text-to-GIF seems to be the most exciting use case.

A teddy bear playing the electric guitar

(Image credit: Nvidia)

Simple prompts like ‘a storm trooper vacuuming on the beach' and a ‘teddy bear is playing the electric guitar, high definition, 4K' produce some pretty usable results, even if there are naturally artifacts and morphing with some of the creations.

Source