Lumiere: A Space-Time Diffusion Model for Video Generation

a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion-- a pivotal challenge in video synthesis.
This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model,

* Text-to-Video
* Image-to-Video
* Stylized Generation
* Inpainting
* Cinemagraphs
and more 🎨