A searchable list of some of my publications is below. You can also access my publications from the following sites.
My ORCID is
Publications:
Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa
Text as Neural Operator: Image Manipulation by Text Instruction Proceedings Article
In: ACM International Conference on Multimedia (ACM-MM), ACM Press, 2021.
Abstract | Links | BibTeX | Tags: computer vision, generative media, google, multimedia
@inproceedings{2021-Zhang-TNOIMTI,
title = {Text as Neural Operator: Image Manipulation by Text Instruction},
author = {Tianhao Zhang and Hung-Yu Tseng and Lu Jiang and Weilong Yang and Honglak Lee and Irfan Essa},
url = {https://dl.acm.org/doi/10.1145/3474085.3475343
https://arxiv.org/abs/2008.04556},
doi = {10.1145/3474085.3475343},
year = {2021},
date = {2021-10-01},
urldate = {2021-10-01},
booktitle = {ACM International Conference on Multimedia (ACM-MM)},
publisher = {ACM Press},
abstract = {In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community. The input to conditional image generation has evolved from image-only to multimodality. In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects. The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image. We propose a GAN-based method to tackle this problem. The key idea is to treat text as neural operators to locally modify the image feature. We show that the proposed model performs favorably against recent strong baselines on three public datasets. Specifically, it generates images of greater fidelity and semantic relevance, and when used as a image query, leads to better retrieval performance.},
keywords = {computer vision, generative media, google, multimedia},
pubstate = {published},
tppubtype = {inproceedings}
}
Huda Alamri, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Jue Wang, Irfan Essa, Dhruv Batra, Devi Parikh, Anoop Cherian, Tim K Marks, Chiori Hori
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7 Technical Report
no. arXiv:1806.00525, 2018.
Abstract | Links | BibTeX | Tags: arXiv, embodied agents, multimedia, vision & language
@techreport{2018-Alamri-AVSDACD,
title = {Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7},
author = {Huda Alamri and Vincent Cartillier and Raphael Gontijo Lopes and Abhishek Das and Jue Wang and Irfan Essa and Dhruv Batra and Devi Parikh and Anoop Cherian and Tim K Marks and Chiori Hori},
url = {https://video-dialog.com/
https://arxiv.org/abs/1806.00525},
doi = {10.48550/arXiv.1806.00525},
year = {2018},
date = {2018-06-01},
urldate = {2018-06-01},
journal = {arXiv},
number = {arXiv:1806.00525},
abstract = {Scene-aware dialog systems will be able to have conversations with users about the objects and events around them. Progress on such systems can be made by integrating state-of-the-art technologies from multiple research areas including end-to-end dialog systems visual dialog, and video description. We introduce the Audio Visual Scene Aware Dialog (AVSD) challenge and dataset. In this challenge, which is one track of the 7th Dialog System Technology Challenges (DSTC7) workshop1, the task is to build a system that generates responses in a dialog about an input video
},
howpublished = {arXiv:1806.00525},
keywords = {arXiv, embodied agents, multimedia, vision & language},
pubstate = {published},
tppubtype = {techreport}
}
Kihwan Kim, Irfan Essa, Gregory Abowd
Interactive Mosaic Generation for Video Navigation Proceedings Article
In: ACM International Conference on Multimedia (ACM-MM), Santa Barbara,CA,USA, 2006.
Abstract | Links | BibTeX | Tags: ACMMM, computational video, multimedia
@inproceedings{2006-Kim-IMGVN,
title = {Interactive Mosaic Generation for Video Navigation},
author = {Kihwan Kim and Irfan Essa and Gregory Abowd },
url = {https://doi.org/10.1145/1180639.1180776},
doi = {10.1145/1180639.1180776},
year = {2006},
date = {2006-10-01},
urldate = {2006-10-01},
booktitle = {ACM International Conference on Multimedia (ACM-MM)},
address = {Santa Barbara,CA,USA},
abstract = {Navigation through large multimedia collections that include videos and images still remains cumbersome. In this paper, we introduce a novel method to visualize and navigate through the collection by creating a mosaic image that visually represents the compilation. This image is generated by a labeling-based layout algorithm using various sizes of sample tile images from the collection. Each tile represents both the photographs and video files representing scenes selected by matching algorithms. This generated mosaic image provides a new way for thematic video and visually summarizes the videos. Users can generate these mosaics with some predefined themes and layouts, or base it on the results of their queries. Our approach supports automatic generation of these layouts by using meta-information such as color, time-line and existence of faces or manually generated annotated information from existing systems (e.g., the Family Video Archive).},
keywords = {ACMMM, computational video, multimedia},
pubstate = {published},
tppubtype = {inproceedings}
}
Y. Angelov, Umakishire Ramachandran, Ken Mackenzie, James Rehg, Irfan Essa
Experiences with optimizing two stream-based applications for cluster execution. Journal Article
In: Journal of Parallel and Distributed Computing, vol. 65, no. 6, pp. 678-691, 2005.
BibTeX | Tags: audio-video fusion, intelligent environments, multimedia
@article{2005-Angelov-EWOSACE,
title = {Experiences with optimizing two stream-based applications for cluster execution.},
author = {Y. Angelov and Umakishire Ramachandran and Ken Mackenzie and James Rehg and Irfan Essa},
year = {2005},
date = {2005-01-01},
urldate = {2005-01-01},
journal = {Journal of Parallel and Distributed Computing},
volume = {65},
number = {6},
pages = {678-691},
keywords = {audio-video fusion, intelligent environments, multimedia},
pubstate = {published},
tppubtype = {article}
}
Other Publication Sites
A few more sites that aggregate research publications: Academic.edu, Bibsonomy, CiteULike, Mendeley.
Copyright/About
[Please see the Copyright Statement that may apply to the content listed here.]
This list of publications is produced by using the teachPress plugin for WordPress.