June 13, 2025 / Last updated : June 16, 2025 irfan Google

Wizard of Oz at the Las Vegas Sphere, using Google AI

I am honored to be part of a Google team that has worked with MSG Sphere, Magnopus, and Warner Bros. to bring the 1939 Classic Film, The Wizard of Oz (Wikipedia, IMDB), to the world’s largest screen in an experiential format while honoring the original content. For more details on this work’s technical and creative aspects, check […]

June 13, 2025 / Last updated : June 13, 2025 irfan CVPR

CVPR 2025 paper on “Cropper: Vision-Language Model for Image Cropping through In-Context Learning”

June 13, 2025 / Last updated : June 13, 2025 irfan CVPR

CVPR 2025 paper on “Calibrated Multi-Preference Optimization for Aligning Diffusion Models”

Citation

July 22, 2024 / Last updated : July 25, 2024 irfan ICML

Award-winning paper in ICML 2024 on “VideoPoet: A large language model for zero-shot video generation.”

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs — including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model’s state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet’s ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/