Paper in ICLR 2020 on “DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames”
We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever ‘stale’), making it conceptually simple and easy to implement.