August 1, 2025 / Last updated : August 15, 2025 irfan ACL

ACL 2025 paper (Awarded the “Best Social Impact Award”) on “AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset”

The Largest Study on LLMs in African Healthcare
Creating a novel benchmark dataset of 25,000 question-answer pairs to rigorously evaluate 20+ LLMs on African healthcare, spanning 32 clinical specialties, contributed by 1k+ African clinicians across 15 countries

June 13, 2025 / Last updated : June 16, 2025 irfan Google

Wizard of Oz at the Las Vegas Sphere, using Google AI

I am honored to be part of a Google team that has worked with MSG Sphere, Magnopus, and Warner Bros. to bring the 1939 Classic Film, The Wizard of Oz (Wikipedia, IMDB), to the world’s largest screen in an experiential format while honoring the original content. For more details on this work’s technical and creative aspects, check […]

June 13, 2025 / Last updated : June 13, 2025 irfan CVPR

CVPR 2025 paper on “Cropper: Vision-Language Model for Image Cropping through In-Context Learning”

June 13, 2025 / Last updated : June 13, 2025 irfan CVPR

CVPR 2025 paper on “Calibrated Multi-Preference Optimization for Aligning Diffusion Models”

Citation

July 22, 2024 / Last updated : July 25, 2024 irfan ICML

Award-winning paper in ICML 2024 on “VideoPoet: A large language model for zero-shot video generation.”

We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs — including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model’s state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet’s ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/

August 9, 2023 / Last updated : December 13, 2024 irfan SIGGRAPH

ACM SIGGRAPH Seminal Graphics Papers, Volume 2. Published as part of SIGGRAPH 50th Anniversary Meeting in 2023

ACM SIGGRAPH has published Seminal Graphics Papers: Pushing the Boundaries, Volume 2, as part of its 50th-year celebration. They are making all these amazing papers available online for free access. These are all amazing papers I have read for my research and included in my teachings. Proud that 2 of my papers with amazing collaborators […]

April 23, 2023 / Last updated : August 9, 2023 irfan UIST

Paper in UIST 2023 on “Slide Gestalt: Automatic Structure Extraction in Slide Decks for Non-Visual Access”

Presentation slides commonly use visual patterns for structural navigation, such as titles, dividers, and build slides. However, screen readers do not capture such intention, making it time-consuming and less accessible for blind and visually impaired (BVI) users to linearly consume slides with repeated content. We present Slide Gestalt, an automatic approach that identifies the hierarchical structure in a slide deck. Slide Gestalt computes the visual and textual correspondences between slides to generate hierarchical groupings. Readers can navigate the slide deck from the higher-level section overview to the lower-level description of a slide group or individual elements interactively with our UI. We derived side consumption and authoring practices from interviews with BVI readers and sighted creators and an analysis of 100 decks. We performed our pipeline with 50 real-world slide decks and a large dataset. Feedback from eight BVI participants showed that Slide Gestalt helped navigate a slide deck by anchoring content more efficiently, compared to using accessible slides.

March 22, 2023 / Last updated : July 24, 2024 irfan ICLR

Award-winning paper in ICLR 2023 on “Emergence of Maps in the Memories of Blind Navigation Agents”

Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines — specifically, artificial intelligence (AI) navigation agents — also build implicit (or ‘mental’) maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. …

March 10, 2023 / Last updated : March 25, 2023 irfan ICLR

Paper in ICLR 2023 on “Discrete Predictor-Corrector Diffusion Models for Image Synthesis”

We introduce Discrete Predictor-Corrector diffusion models (DPC), extending predictor-corrector samplers in Gaussian diffusion models to the discrete case. Predictor-corrector samplers are a class of samplers for diffusion models, which improve on ancestral samplers by correcting the sampling distribution of intermediate diffusion states using MCMC methods. …

March 10, 2023 / Last updated : March 14, 2023 irfan Publications

Some recent publications for 2023

Here is a list of some recent works accepted for publication that I am honored to be part of. These will be appearing in CHI, ICLR, and CVPR. Excited to share these new efforts.

Blog