Skip to the content Skip to the Navigation

Irfan Essa

  • Home
  • Blog
  • Publications
  • Team
  • Videos
  • Teaching
  • FAQ
  • Contact

Blog

  1. HOME
  2. Blog
June 13, 2025 / Last updated : June 13, 2025 irfan Google

Wizard of Oz at the Las Vegas Sphere, using Google AI

I am honored to be part of a Google team that has worked with MSG Sphere, Magnopus, and Warner Bros. to bring the 1939 Classic Film, The Wizard of Oz (Wikipedia, IMDB), to the world’s largest screen in an experiential format while honoring the original content. For more details on the technical and creative aspects of this […]

June 13, 2025 / Last updated : June 13, 2025 irfan CVPR

CVPR 2025 paper on “Cropper: Vision-Language Model for Image Cropping through In-Context Learning”

June 13, 2025 / Last updated : June 13, 2025 irfan CVPR

CVPR 2025 paper on “Calibrated Multi-Preference Optimization for Aligning Diffusion Models”

Citation

July 22, 2024 / Last updated : July 25, 2024 irfan ICML

Award-winning paper in ICML 2024 on “VideoPoet: A large language model for zero-shot video generation.”

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs — including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model’s state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet’s ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/

August 9, 2023 / Last updated : December 13, 2024 irfan SIGGRAPH

ACM SIGGRAPH Seminal Graphics Papers, Volume 2. Published as part of SIGGRAPH 50th Anniversary Meeting in 2023

ACM SIGGRAPH has published Seminal Graphics Papers: Pushing the Boundaries, Volume 2, as part of its 50th-year celebration. They are making all these amazing papers available online for free access. These are all amazing papers I have read for my research and included in my teachings. Proud that 2 of my papers with amazing collaborators […]

April 23, 2023 / Last updated : August 9, 2023 irfan UIST

Paper in UIST 2023 on “Slide Gestalt: Automatic Structure Extraction in Slide Decks for Non-Visual Access”

Presentation slides commonly use visual patterns for structural navigation, such as titles, dividers, and build slides. However, screen readers do not capture such intention, making it time-consuming and less accessible for blind and visually impaired (BVI) users to linearly consume slides with repeated content. We present Slide Gestalt, an automatic approach that identifies the hierarchical structure in a slide deck. Slide Gestalt computes the visual and textual correspondences between slides to generate hierarchical groupings. Readers can navigate the slide deck from the higher-level section overview to the lower-level description of a slide group or individual elements interactively with our UI. We derived side consumption and authoring practices from interviews with BVI readers and sighted creators and an analysis of 100 decks. We performed our pipeline with 50 real-world slide decks and a large dataset. Feedback from eight BVI participants showed that Slide Gestalt helped navigate a slide deck by anchoring content more efficiently, compared to using accessible slides.

March 22, 2023 / Last updated : July 24, 2024 irfan ICLR

Award-winning paper in ICLR 2023 on “Emergence of Maps in the Memories of Blind Navigation Agents”

Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines — specifically, artificial intelligence (AI) navigation agents — also build implicit (or ‘mental’) maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. …

March 10, 2023 / Last updated : March 25, 2023 irfan ICLR

Paper in ICLR 2023 on “Discrete Predictor-Corrector Diffusion Models for Image Synthesis”

We introduce Discrete Predictor-Corrector diffusion models (DPC), extending predictor-corrector samplers in Gaussian diffusion models to the discrete case. Predictor-corrector samplers are a class of samplers for diffusion models, which improve on ancestral samplers by correcting the sampling distribution of intermediate diffusion states using MCMC methods. …

March 10, 2023 / Last updated : March 14, 2023 irfan Publications

Some recent publications for 2023

Here is a list of some recent works accepted for publication that I am honored to be part of. These will be appearing in CHI, ICLR, and CVPR. Excited to share these new efforts.

December 31, 2022 / Last updated : March 14, 2023 irfan Publications

Publications in 2022

Here is a list of all papers from 2022. Kudos to all my collaborators. Well-done

Posts pagination

  • Page 1
  • Page 2
  • …
  • Page 26
  • »

Recent Posts

Wizard of Oz at the Las Vegas Sphere, using Google AI
June 13, 2025
CVPR 2025 paper on “Cropper: Vision-Language Model for Image Cropping through In-Context Learning”
June 13, 2025
CVPR 2025 paper on “Calibrated Multi-Preference Optimization for Aligning Diffusion Models”
June 13, 2025
Award-winning paper in ICML 2024 on “VideoPoet: A large language model for zero-shot video generation.”
July 22, 2024
ACM SIGGRAPH Seminal Graphics Papers, Volume 2. Published as part of SIGGRAPH 50th Anniversary Meeting in 2023
August 9, 2023
Paper in UIST 2023 on “Slide Gestalt: Automatic Structure Extraction in Slide Decks for Non-Visual Access”
April 23, 2023
Award-winning paper in ICLR 2023 on “Emergence of Maps in the Memories of Blind Navigation Agents”
March 22, 2023
Paper in ICLR 2023 on “Discrete Predictor-Corrector Diffusion Models for Image Synthesis”
March 10, 2023
Some recent publications for 2023
March 10, 2023
Publications in 2022
December 31, 2022

Tags

ACM (20) Activity Recognition (52) Affective Computing (9) Aging-in-place (5) AI (20) Audio Analysis (9) Awards (15) Aware Home (15) Behavioral Imaging (11) Best Paper Award (11) Computational Journalism (36) Computational Photography (62) Computational Video (71) Computer Animation (10) Computer Graphics (9) Computer Vision (117) CVPR (30) DVFX (9) ECCV (5) Events (7) Faces (12) Funding (7) Generative Media (5) Gesture (6) Google (24) HCI (8) Health (7) ICCV (8) IEEE (30) Machine Learning (39) Medical (10) ML@GT (5) News (17) NSF (16) PhD Thesis (12) Presentations (28) Robotics (10) SIGGRAPH (7) Sports Visualization (6) Teaching (21) Ubiquitous Computing (5) Video Segmentation (7) Video Stabilization (14) WACV (8) Wearable Computing (9)

More about this Website

  • About
    • Tags & Categories
    • Archives
    • Copyright
    • Privacy Policy

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Copyright © Irfan Essa All Rights Reserved.

Powered by WordPress with Lightning Theme & VK All in One Expansion Unit

MENU
  • Home
  • Blog
  • Publications
  • Team
  • Videos
  • Teaching
  • FAQ
  • Contact
PAGE TOP