A searchable list of some of my publications is below. You can also access my publications from the following sites.
My ORCID is https://orcid.org/0000-0002-6236-2969Publications:
Erik Wijmans, Manolis Savva, Irfan Essa, Stefan Lee, Ari S. Morcos, Dhruv Batra
Emergence of Maps in the Memories of Blind Navigation Agents Best Paper Proceedings Article
In: Proceedings of International Conference on Learning Representations (ICLR), 2023.
Abstract | Links | BibTeX | Tags: awards, best paper award, computer vision, google, ICLR, machine learning, robotics
@inproceedings{2023-Wijmans-EMMBNA,
title = {Emergence of Maps in the Memories of Blind Navigation Agents},
author = {Erik Wijmans and Manolis Savva and Irfan Essa and Stefan Lee and Ari S. Morcos and Dhruv Batra},
url = {https://arxiv.org/abs/2301.13261
https://wijmans.xyz/publication/eom/
https://openreview.net/forum?id=lTt4KjHSsyl
https://blog.iclr.cc/2023/03/21/announcing-the-iclr-2023-outstanding-paper-award-recipients/},
doi = {10.48550/ARXIV.2301.13261},
year = {2023},
date = {2023-05-01},
urldate = {2023-05-01},
booktitle = {Proceedings of International Conference on Learning Representations (ICLR)},
abstract = {Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms. Specifically, we train 'blind' agents -- with sensing limited to only egomotion and no other sensing of any kind -- to perform PointGoal navigation ('go to Δ x, Δ y') via reinforcement learning. Our agents are composed of navigation-agnostic components (fully-connected and recurrent neural networks), and our experimental setup provides no inductive bias towards mapping. Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments (~95% success); (2) they utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent 'forgets' exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation.},
keywords = {awards, best paper award, computer vision, google, ICLR, machine learning, robotics},
pubstate = {published},
tppubtype = {inproceedings}
}
Erik Wijmans, Irfan Essa, Dhruv Batra
How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget Proceedings Article
In: International Conference on Autonomous Agents and Multi-Agent Systems, 2022.
Abstract | Links | BibTeX | Tags: computer vision, embodied agents, navigation
@inproceedings{2022-Wijmans-TPNASCB,
title = {How to Train PointGoal Navigation Agents on a (Sample and Compute) Budget},
author = {Erik Wijmans and Irfan Essa and Dhruv Batra},
url = {https://arxiv.org/abs/2012.06117
https://ifaamas.org/Proceedings/aamas2022/pdfs/p1762.pdf},
doi = {10.48550/arXiv.2012.06117},
year = {2022},
date = {2022-12-01},
urldate = {2020-12-01},
booktitle = {International Conference on Autonomous Agents and Multi-Agent Systems},
journal = {arXiv},
number = {arXiv:2012.06117},
abstract = {PointGoal navigation has seen significant recent interest and progress, spurred on by the Habitat platform and associated challenge. In this paper, we study PointGoal navigation under both a sample budget (75 million frames) and a compute budget (1 GPU for 1 day). We conduct an extensive set of experiments, cumulatively totaling over 50,000 GPU-hours, that let us identify and discuss a number of ostensibly minor but significant design choices -- the advantage estimation procedure (a key component in training), visual encoder architecture, and a seemingly minor hyper-parameter change. Overall, these design choices to lead considerable and consistent improvements over the baselines present in Savva et al. Under a sample budget, performance for RGB-D agents improves 8 SPL on Gibson (14% relative improvement) and 20 SPL on Matterport3D (38% relative improvement). Under a compute budget, performance for RGB-D agents improves by 19 SPL on Gibson (32% relative improvement) and 35 SPL on Matterport3D (220% relative improvement). We hope our findings and recommendations will make serve to make the community's experiments more efficient.},
keywords = {computer vision, embodied agents, navigation},
pubstate = {published},
tppubtype = {inproceedings}
}
Erik Wijmans, Irfan Essa, Dhruv Batra
VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement Proceedings Article
In: Oh, Alice H., Agarwal, Alekh, Belgrave, Danielle, Cho, Kyunghyun (Ed.): Advances in Neural Information Processing Systems (NeurIPS), 2022.
Abstract | Links | BibTeX | Tags: machine learning, NeurIPS, reinforcement learning, robotics
@inproceedings{2022-Wijmans-SOLENER,
title = {VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement},
author = {Erik Wijmans and Irfan Essa and Dhruv Batra},
editor = {Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
url = {https://arxiv.org/abs/2210.05064
https://openreview.net/forum?id=VrJWseIN98},
doi = {10.48550/ARXIV.2210.05064},
year = {2022},
date = {2022-12-01},
urldate = {2022-12-01},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
abstract = {We present Variable Experience Rollout (VER), a technique for efficiently scaling batched on-policy reinforcement learning in heterogenous environments (where different environments take vastly different times to generate rollouts) to many GPUs residing on, potentially, many machines. VER combines the strengths of and blurs the line between synchronous and asynchronous on-policy RL methods (SyncOnRL and AsyncOnRL, respectively). Specifically, it learns from on-policy experience (like SyncOnRL) and has no synchronization points (like AsyncOnRL) enabling high throughput.
We find that VER leads to significant and consistent speed-ups across a broad range of embodied navigation and mobile manipulation tasks in photorealistic 3D simulation environments. Specifically, for PointGoal navigation and ObjectGoal navigation in Habitat 1.0, VER is 60-100% faster (1.6-2x speedup) than DD-PPO, the current state of art for distributed SyncOnRL, with similar sample efficiency. For mobile manipulation tasks (open fridge/cabinet, pick/place objects) in Habitat 2.0 VER is 150% faster (2.5x speedup) on 1 GPU and 170% faster (2.7x speedup) on 8 GPUs than DD-PPO. Compared to SampleFactory (the current state-of-the-art AsyncOnRL), VER matches its speed on 1 GPU, and is 70% faster (1.7x speedup) on 8 GPUs with better sample efficiency.
We leverage these speed-ups to train chained skills for GeometricGoal rearrangement tasks in the Home Assistant Benchmark (HAB). We find a surprising emergence of navigation in skills that do not ostensible require any navigation. Specifically, the Pick skill involves a robot picking an object from a table. During training the robot was always spawned close to the table and never needed to navigate. However, we find that if base movement is part of the action space, the robot learns to navigate then pick an object in new environments with 50% success, demonstrating surprisingly high out-of-distribution generalization.},
keywords = {machine learning, NeurIPS, reinforcement learning, robotics},
pubstate = {published},
tppubtype = {inproceedings}
}
We find that VER leads to significant and consistent speed-ups across a broad range of embodied navigation and mobile manipulation tasks in photorealistic 3D simulation environments. Specifically, for PointGoal navigation and ObjectGoal navigation in Habitat 1.0, VER is 60-100% faster (1.6-2x speedup) than DD-PPO, the current state of art for distributed SyncOnRL, with similar sample efficiency. For mobile manipulation tasks (open fridge/cabinet, pick/place objects) in Habitat 2.0 VER is 150% faster (2.5x speedup) on 1 GPU and 170% faster (2.7x speedup) on 8 GPUs than DD-PPO. Compared to SampleFactory (the current state-of-the-art AsyncOnRL), VER matches its speed on 1 GPU, and is 70% faster (1.7x speedup) on 8 GPUs with better sample efficiency.
We leverage these speed-ups to train chained skills for GeometricGoal rearrangement tasks in the Home Assistant Benchmark (HAB). We find a surprising emergence of navigation in skills that do not ostensible require any navigation. Specifically, the Pick skill involves a robot picking an object from a table. During training the robot was always spawned close to the table and never needed to navigate. However, we find that if base movement is part of the action space, the robot learns to navigate then pick an object in new environments with 50% success, demonstrating surprisingly high out-of-distribution generalization.
Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra
Decentralized Distributed PPO: Solving PointGoal Navigation Proceedings Article
In: Proceedings of International Conference on Learning Representations (ICLR), 2020.
Abstract | Links | BibTeX | Tags: embodied agents, ICLR, navigation, systems for ML
@inproceedings{2020-Wijmans-DDSPN,
title = {Decentralized Distributed PPO: Solving PointGoal Navigation},
author = {Erik Wijmans and Abhishek Kadian and Ari Morcos and Stefan Lee and Irfan Essa and Devi Parikh and Manolis Savva and Dhruv Batra},
url = {https://arxiv.org/abs/1911.00357
https://paperswithcode.com/paper/decentralized-distributed-ppo-solving},
year = {2020},
date = {2020-04-01},
urldate = {2020-04-01},
booktitle = {Proceedings of International Conference on Learning Representations (ICLR)},
abstract = {We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling -- achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.
This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the task --near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks -- the analog of ImageNet pre-training + task-specific fine-tuning for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available).},
keywords = {embodied agents, ICLR, navigation, systems for ML},
pubstate = {published},
tppubtype = {inproceedings}
}
This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the task --near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks -- the analog of ImageNet pre-training + task-specific fine-tuning for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models and code are publicly available).
Erik Wijmans, Julian Straub, Dhruv Batra, Irfan Essa, Judy Hoffman, Ari Morcos
Analyzing Visual Representations in Embodied Navigation Tasks Technical Report
no. arXiv:2003.05993, 2020.
Abstract | Links | BibTeX | Tags: arXiv, embodied agents, navigation
@techreport{2020-Wijmans-AVRENT,
title = {Analyzing Visual Representations in Embodied Navigation Tasks},
author = {Erik Wijmans and Julian Straub and Dhruv Batra and Irfan Essa and Judy Hoffman and Ari Morcos},
url = {https://arxiv.org/abs/2003.05993
https://arxiv.org/pdf/2003.05993},
doi = {10.48550/arXiv.2003.05993},
year = {2020},
date = {2020-03-01},
urldate = {2020-03-01},
journal = {arXiv},
number = {arXiv:2003.05993},
abstract = {Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task. In this work, we present a methodology to study the underlying potential causes for this specialization. We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of visual representations learned in the same environment by performing different tasks.
We then leverage our proposed methodology to examine the task dependence of visual representations learned on related but distinct embodied navigation tasks. Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures. We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.},
howpublished = {arXiv:2003.05993},
keywords = {arXiv, embodied agents, navigation},
pubstate = {published},
tppubtype = {techreport}
}
We then leverage our proposed methodology to examine the task dependence of visual representations learned on related but distinct embodied navigation tasks. Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures. We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.
Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra
Embodied Question Answering in Photorealistic Environments With Point Cloud Perception Proceedings Article
In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Links | BibTeX | Tags: computer vision, CVPR, vision & language
@inproceedings{2019-Wijmans-EQAPEWPCP,
title = {Embodied Question Answering in Photorealistic Environments With Point Cloud Perception},
author = {Erik Wijmans and Samyak Datta and Oleksandr Maksymets and Abhishek Das and Georgia Gkioxari and Stefan Lee and Irfan Essa and Devi Parikh and Dhruv Batra},
doi = {10.1109/CVPR.2019.00682},
year = {2019},
date = {2019-06-01},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
keywords = {computer vision, CVPR, vision & language},
pubstate = {published},
tppubtype = {inproceedings}
}
Other Publication Sites
A few more sites that aggregate research publications: Academic.edu, Bibsonomy, CiteULike, Mendeley.
Copyright/About
[Please see the Copyright Statement that may apply to the content listed here.]
This list of publications is produced by using the teachPress plugin for WordPress.