A searchable list of some of my publications is below. You can also access my publications from the following sites.
My ORCID is https://orcid.org/0000-0002-6236-2969Publications:
Peggy Chi, Zheng Sun, Katrina Panovich, Irfan Essa
Automatic Video Creation From a Web Page Proceedings Article
In: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 279–292, ACM CHI 2020.
Abstract | Links | BibTeX | Tags: computational video, google, human-computer interaction, UIST, video editing
@inproceedings{2020-Chi-AVCFP,
title = {Automatic Video Creation From a Web Page},
author = {Peggy Chi and Zheng Sun and Katrina Panovich and Irfan Essa},
url = {https://dl.acm.org/doi/abs/10.1145/3379337.3415814
https://research.google/pubs/pub49618/
https://ai.googleblog.com/2020/10/experimenting-with-automatic-video.html
https://www.youtube.com/watch?v=3yFYc-Wet8k},
doi = {10.1145/3379337.3415814},
year = {2020},
date = {2020-10-01},
urldate = {2020-10-01},
booktitle = {Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology},
pages = {279--292},
organization = {ACM CHI},
abstract = {Creating marketing videos from scratch can be challenging, especially when designing for multiple platforms with different viewing criteria. We present URL2Video, an automatic approach that converts a web page into a short video given temporal and visual constraints. URL2Video captures quality materials and design styles extracted from a web page, including fonts, colors, and layouts. Using constraint programming, URL2Video's design engine organizes the visual assets into a sequence of shots and renders to a video with user-specified aspect ratio and duration. Creators can review the video composition, modify constraints, and generate video variation through a user interface. We learned the design process from designers and compared our automatically generated results with their creation through interviews and an online survey. The evaluation shows that URL2Video effectively extracted design elements from a web page and supported designers by bootstrapping the video creation process.},
keywords = {computational video, google, human-computer interaction, UIST, video editing},
pubstate = {published},
tppubtype = {inproceedings}
}
Hsin-Ying Lee, Lu Jiang, Irfan Essa, Madison Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang
Neural Design Network: Graphic Layout Generation with Constraints Proceedings Article
In: Proceedings of European Conference on Computer Vision (ECCV), 2020.
Links | BibTeX | Tags: computer vision, content creation, ECCV, generative media, google
@inproceedings{2020-Lee-NDNGLGWC,
title = {Neural Design Network: Graphic Layout Generation with Constraints},
author = {Hsin-Ying Lee and Lu Jiang and Irfan Essa and Madison Le and Haifeng Gong and Ming-Hsuan Yang and Weilong Yang},
url = {https://arxiv.org/abs/1912.09421
https://rdcu.be/c7sqw},
doi = {10.1007/978-3-030-58580-8_29},
year = {2020},
date = {2020-08-01},
urldate = {2020-08-01},
booktitle = {Proceedings of European Conference on Computer Vision (ECCV)},
keywords = {computer vision, content creation, ECCV, generative media, google},
pubstate = {published},
tppubtype = {inproceedings}
}
Caroline Pantofaru, Vinay Bettadapura, Krishna Bharat, Irfan Essa
Systems and methods for directing content generation using a first-person point-of-view device. Patent
2020.
Abstract | Links | BibTeX | Tags: computer vision, google, patents
@patent{2020-Pantofaru-SMDCGUFPD,
title = {Systems and methods for directing content generation using a first-person point-of-view device.},
author = {Caroline Pantofaru and Vinay Bettadapura and Krishna Bharat and Irfan Essa},
url = {https://patents.google.com/patent/US10721439},
year = {2020},
date = {2020-07-21},
urldate = {2020-07-01},
publisher = {(US Patent # 10721439)},
abstract = {A method for personalizing a content item using captured footage is disclosed. The method includes receiving a first video feed from a first camera, wherein the first camera is designated as a source camera for capturing an event during a first time duration. The method also includes receiving data from a second camera, and determining, based on the received data from the second camera, that an action was performed using the second camera, the action being indicative of a region of interest (ROI) of the user of the second camera occurring within a second time duration. The method further includes designating the second camera as the source camera for capturing the event during the second time duration.
},
howpublished = {US Patent # 10721439},
keywords = {computer vision, google, patents},
pubstate = {published},
tppubtype = {patent}
}
Peggy Chi, Irfan Essa
Interactive Visual Description of a Web Page for Smart Speakers Proceedings Article
In: Proceedings of ACM CHI Workshop, CUI@CHI: Mapping Grand Challenges for the Conversational User Interface Community, Honolulu, Hawaii, USA, 2020.
Abstract | Links | BibTeX | Tags: accessibility, CHI, google, human-computer interaction
@inproceedings{2020-Chi-IVDPSS,
title = {Interactive Visual Description of a Web Page for Smart Speakers},
author = {Peggy Chi and Irfan Essa},
url = {https://research.google/pubs/pub49441/
http://www.speechinteraction.org/CHI2020/programme.html},
year = {2020},
date = {2020-05-01},
urldate = {2020-05-01},
booktitle = {Proceedings of ACM CHI Workshop, CUI@CHI: Mapping Grand Challenges for the Conversational User Interface Community},
address = {Honolulu, Hawaii, USA},
abstract = {Smart speakers are becoming ubiquitous for accessing lightweight information using speech. While these devices are powerful for question answering and service operations using voice commands, it is challenging to navigate content of rich formats–including web pages–that are consumed by mainstream computing devices. We conducted a comparative study with 12 participants that suggests and motivates the use of a narrative voice output of a web page as being easier to follow and comprehend than a conventional screen reader. We are developing a tool that automatically narrates web documents based on their visual structures with interactive prompts. We discuss the design challenges for a conversational agent to intelligently select content for a more personalized experience, where we hope to contribute to the CUI workshop and form a discussion for future research.
},
keywords = {accessibility, CHI, google, human-computer interaction},
pubstate = {published},
tppubtype = {inproceedings}
}
Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar
Category learning neural networks Patent
2020.
Abstract | Links | BibTeX | Tags: google, machine learning, patents
@patent{2020-Hickson-CLNN,
title = {Category learning neural networks},
author = {Steven Hickson and Anelia Angelova and Irfan Essa and Rahul Sukthankar},
url = {https://patents.google.com/patent/US10635979},
year = {2020},
date = {2020-04-28},
urldate = {2020-04-28},
publisher = {(US Patent # 10635979)},
abstract = {Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining a clustering of images into a plurality of semantic categories. In one aspect, a method comprises: training a categorization neural network, comprising, at each of a plurality of iterations: processing an image depicting an object using the categorization neural network to generate (i) a current prediction for whether the image depicts an object or a background region, and (ii) a current embedding of the image; determining a plurality of current cluster centers based on the current values of the categorization neural network parameters, wherein each cluster center represents a respective semantic category; and determining a gradient of an objective function that includes a classification loss and a clustering loss, wherein the clustering loss depends on a similarity between the current embedding of the image and the current cluster centers.
},
howpublished = {US Patent #10635979},
keywords = {google, machine learning, patents},
pubstate = {published},
tppubtype = {patent}
}
Thad Eugene Starner, Irfan Essa, Hayes Solos Raffle, Daniel Aminzade
Object occlusion to initiate a visual search Patent
2019, (US Patent 10,437,882).
Abstract | Links | BibTeX | Tags: computer vision, google, patents
@patent{2019-Starner-OOIVS,
title = {Object occlusion to initiate a visual search},
author = {Thad Eugene Starner and Irfan Essa and Hayes Solos Raffle and Daniel Aminzade},
url = {https://patents.google.com/patent/US10437882},
year = {2019},
date = {2019-10-01},
urldate = {2019-10-01},
publisher = {(US Patent # 10437882)},
abstract = {Methods, systems, and apparatus, including computer programs encoded on computer storage media, for video segmentation. One of the methods includes receiving a digital video; performing hierarchical graph-based video segmentation on at least one frame of the digital video to generate a boundary representation for the at least one frame; generating a vector representation from the boundary representation for the at least one frame of the digital video, wherein generating the vector representation includes generating a polygon composed of at least three vectors, wherein each vector comprises two vertices connected by a line segment, from a boundary in the boundary representation; linking the vector representation to the at least one frame of the digital video; and storing the vector representation with the at least one frame of the digital video.
},
howpublished = {US Patent # 10437882},
note = {US Patent 10,437,882},
keywords = {computer vision, google, patents},
pubstate = {published},
tppubtype = {patent}
}
Steven Hickson, Karthik Raveendran, Alireza Fathi, Kevin Murphy, Irfan Essa
Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction Proceedings Article
In: IEEE International Conference on Computer Vision (ICCV) Workshop on Geometry Meets Deep Learning, 2019.
Abstract | Links | BibTeX | Tags: computer vision, google, ICCV
@inproceedings{2019-Hickson-FFLSRSNP,
title = {Floors are Flat: Leveraging Semantics for Real-Time Surface Normal Prediction},
author = {Steven Hickson and Karthik Raveendran and Alireza Fathi and Kevin Murphy and Irfan Essa},
url = {https://arxiv.org/abs/1906.06792
https://openaccess.thecvf.com/content_ICCVW_2019/papers/GMDL/Hickson_Floors_are_Flat_Leveraging_Semantics_for_Real-Time_Surface_Normal_Prediction_ICCVW_2019_paper.pdf},
doi = {10.1109/ICCVW.2019.00501},
year = {2019},
date = {2019-10-01},
urldate = {2019-10-01},
booktitle = {IEEE International Conference on Computer Vision (ICCV) Workshop on Geometry Meets Deep Learning},
abstract = {We propose 4 insights that help to significantly improve the performance of deep learning models that predict surface normals and semantic labels from a single RGB image. These insights are: (1) denoise the "ground truth" surface normals in the training set to ensure consistency with the semantic labels; (2) concurrently train on a mix of real and synthetic data, instead of pretraining on synthetic and fine-tuning on real; (3) jointly predict normals and semantics using a shared model, but only backpropagate errors on pixels that have valid training labels; (4) slim down the model and use grayscale instead of color inputs. Despite the simplicity of these steps, we demonstrate consistently improved state of the art results on several datasets, using a model that runs at 12 fps on a standard mobile phone.
},
howpublished = {arXiv preprint arXiv:1906.06792},
keywords = {computer vision, google, ICCV},
pubstate = {published},
tppubtype = {inproceedings}
}
Irfan Essa, Vivek Kwatra, Matthias Grundmann
Vector representation for video segmentation Patent
2018, (US Patent Application 14/587,420).
Links | BibTeX | Tags: computer vision, google, patents
@patent{2018-Essa-VRVS,
title = {Vector representation for video segmentation},
author = {Irfan Essa and Vivek Kwatra and Matthias Grundmann},
url = {https://patents.google.com/patent/US20180350131},
year = {2018},
date = {2018-12-06},
urldate = {2018-12-01},
publisher = {(US Patent Application # 14/587,420)},
howpublished = {US Patent # US20180350131A1},
note = {US Patent Application 14/587,420},
keywords = {computer vision, google, patents},
pubstate = {published},
tppubtype = {patent}
}
Caroline Pantofaru, Vinay Bettadapura, Krishna Bharat, Irfan Essa
Systems and methods for directing content generation using a first-person point-of-view device Patent
2018, (US Patent 10,110,850).
Abstract | Links | BibTeX | Tags: computer vision, google, patents
@patent{2018-Pantofaru-SMDCGUFPD,
title = {Systems and methods for directing content generation using a first-person point-of-view device},
author = {Caroline Pantofaru and Vinay Bettadapura and Krishna Bharat and Irfan Essa},
url = {https://patents.google.com/patent/US10110850},
year = {2018},
date = {2018-10-23},
urldate = {2018-10-01},
publisher = {(US Patent #10110850)},
abstract = {A method for localizing the attention of a user of a first-person point-of-view (FPPOV) device is disclosed. The method includes receiving data from an FPPOV device, the data being indicative of a first region-of-interest (ROI) of an event for a first time duration and a second ROI of the event for a second time duration. The method further include determining that a first camera from a plurality of cameras best captures the first ROI during the first time duration, and determining that a second camera from the plurality of cameras best captures the second ROI during the second time duration.
},
howpublished = {US Patent # US10110850B1},
note = {US Patent 10,110,850},
keywords = {computer vision, google, patents},
pubstate = {published},
tppubtype = {patent}
}
Matthias Grundmann, Vivek Kwatra, Irfan Essa
2018, (US Patent 9,888,180).
Links | BibTeX | Tags: computer vision, google, patents
@patent{2018-Grundmann-CCMERSDCSDVS,
title = {Cascaded camera motion estimation, rolling shutter detection, and camera shake detection for video stabilization},
author = {Matthias Grundmann and Vivek Kwatra and Irfan Essa},
url = {https://patents.google.com/patent/US9888180},
year = {2018},
date = {2018-02-06},
urldate = {2018-02-01},
publisher = {(US Patent #9888180)},
howpublished = {US Patent # US9888180},
note = {US Patent 9,888,180},
keywords = {computer vision, google, patents},
pubstate = {published},
tppubtype = {patent}
}
Other Publication Sites
A few more sites that aggregate research publications: Academic.edu, Bibsonomy, CiteULike, Mendeley.
Copyright/About
[Please see the Copyright Statement that may apply to the content listed here.]
This list of publications is produced by using the teachPress plugin for WordPress.