Skip to the content Skip to the Navigation

Irfan Essa

  • Home
  • Blog
  • Publications
  • Team
  • Videos
  • Teaching
  • FAQ
  • Contact
Blog
  1. HOME
  2. Blog
  3. ICML

ICML

July 22, 2024 / Last updated : July 25, 2024 irfan ICML

Award-winning paper in ICML 2024 on “VideoPoet: A large language model for zero-shot video generation.”

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs — including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model’s state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet’s ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/

Recent Posts

Wizard of Oz at the Las Vegas Sphere, using Google AI
June 13, 2025
CVPR 2025 paper on “Cropper: Vision-Language Model for Image Cropping through In-Context Learning”
June 13, 2025
CVPR 2025 paper on “Calibrated Multi-Preference Optimization for Aligning Diffusion Models”
June 13, 2025
Award-winning paper in ICML 2024 on “VideoPoet: A large language model for zero-shot video generation.”
July 22, 2024
ACM SIGGRAPH Seminal Graphics Papers, Volume 2. Published as part of SIGGRAPH 50th Anniversary Meeting in 2023
August 9, 2023
Paper in UIST 2023 on “Slide Gestalt: Automatic Structure Extraction in Slide Decks for Non-Visual Access”
April 23, 2023
Award-winning paper in ICLR 2023 on “Emergence of Maps in the Memories of Blind Navigation Agents”
March 22, 2023
Paper in ICLR 2023 on “Discrete Predictor-Corrector Diffusion Models for Image Synthesis”
March 10, 2023
Some recent publications for 2023
March 10, 2023
Publications in 2022
December 31, 2022

Tags

ACM (20) Activity Recognition (52) Affective Computing (9) Aging-in-place (5) AI (20) Audio Analysis (9) Awards (15) Aware Home (15) Behavioral Imaging (11) Best Paper Award (11) Computational Journalism (36) Computational Photography (62) Computational Video (71) Computer Animation (10) Computer Graphics (9) Computer Vision (117) CVPR (30) DVFX (9) ECCV (5) Events (7) Faces (12) Funding (7) Generative Media (5) Gesture (6) Google (24) HCI (8) Health (7) ICCV (8) IEEE (30) Machine Learning (39) Medical (10) ML@GT (5) News (17) NSF (16) PhD Thesis (12) Presentations (28) Robotics (10) SIGGRAPH (7) Sports Visualization (6) Teaching (21) Ubiquitous Computing (5) Video Segmentation (7) Video Stabilization (14) WACV (8) Wearable Computing (9)

More about this Website

  • About
    • Tags & Categories
    • Archives
    • Copyright
    • Privacy Policy

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright © Irfan Essa All Rights Reserved.

Powered by WordPress with Lightning Theme & VK All in One Expansion Unit

MENU
  • Home
  • Blog
  • Publications
  • Team
  • Videos
  • Teaching
  • FAQ
  • Contact
PAGE TOP