Paper in ICLR 2020 on “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”

Abstract

We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever ‘stale’), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling – achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) – over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs.

This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially ‘solves’ the task – near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS+Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks – the analog of ‘ImageNet pre-training + task-specific fine-tuning’ for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models + code will be publicly available).

Citation

  • E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, and D. Batra (2020), “Decentralized Distributed PPO: Solving PointGoal Navigation,” in Proceedings of International Conference on Learning Representations (ICLR), 2020. [PDF] [arXiv] [BIBTEX]
    @InProceedings{ 2020-Wijmans-DDSPN,
    archiveprefix  = {arXiv},
    arxiv = {https://arxiv.org/abs/1911.00357},
    author  = {Erik Wijmans and Abhishek Kadian and Ari Morcos and
    Stefan Lee and Irfan Essa and Devi Parikh and
    Manolis Savva and Dhruv Batra},
    booktitle  = {{Proceedings of International Conference on
    Learning Representations (ICLR)}},
    eprint  = {1911.00357},
    month = {April},
    pdf = {https://arxiv.org/pdf/1911.00357},
    primaryclass  = {cs.CV},
    title = {Decentralized Distributed PPO: Solving PointGoal
    Navigation},
    year = {2020}
    }

Video

Other Links

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.