Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines — specifically, artificial intelligence (AI) navigation agents — also build implicit (or ‘mental’) maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. …
Paper in NeurIPS 2022 on “VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement”
We present Variable Experience Rollout (VER), a technique for efficiently scaling batched on-policy reinforcement learning in heterogenous environments (where different environments take vastly different times to generate rollouts) to many GPUs residing on, potentially, many machines. VER combines the strengths of and blurs the line between synchronous and asynchronous on-policy RL methods (SyncOnRL and AsyncOnRL, respectively). Specifically, it learns from on-policy experience (like SyncOnRL) and has no synchronization points (like AsyncOnRL) enabling high throughput.
Paper in ICRA 2022 on “Graph-based Cluttered Scene Generation and Interactive Exploration using Deep Reinforcement Learning”
We introduce a novel method to teach a robotic agent to interactively explore cluttered yet structured scenes, such as kitchen pantries and grocery shelves, by leveraging the physical plausibility of the scene. We propose a novel learning framework to train an effective scene exploration policy to discover hidden objects with minimal interactions. First, we define a novel scene grammar to represent structured clutter. Then we train a Graph Neural Network (GNN) based Scene Generation agent using deep reinforcement learning (deep RL), to manipulate this Scene Grammar to create a diverse set of stable scenes, each containing multiple hidden objects. Given such cluttered scenes, we then train a Scene Exploration agent, using deep RL, to uncover hidden objects by interactively rearranging the scene. We show that our learned agents hide and discover significantly more objects than the baselines. We present quantitative results that prove the generalization capabilities of our agents. We also demonstrate sim-to-real transfer by successfully deploying the learned policy on a real UR10 robot to explore real-world cluttered scenes.
Paper Abstract Most of the approaches for indoor RGBD semantic labeling focus on using pixels or superpixels to train a classifier. In this paper, we implement a higher level segmentation using a hierarchy of superpixels to obtain a better segmentation for training our classifier. By focusing on meaningful segments that conform more directly to objects, […]
Citation Abstract We present an algorithm for finding temporally consistent occlusion boundaries in videos to support the segmentation of dynamic scenes. We learn occlusion boundaries in a pairwise Markov random field (MRF) framework. We first estimate the probability of a spatiotemporal edge being an occlusion boundary by using appearance, flow, and geometric features. Next, we […]
Title : Temporally Consistent Semantic Segmentation in VideosS. Hussain Raza, Ph. D. Candidate in ECE (https://sites.google.com/site/shussainraza5/) Committee: Prof. Irfan Essa (advisor), School of Interactive Computing Prof. David Anderson (co-advisor), School of Electrical and Computer Engineering Prof. Frank Dellaert, School of Interactive Computing Prof. Anthony Yezzi, School of Electrical and Computer Engineering Prof. Chris Barnes, School […]
Citation Abstract We demonstrate the automatic transfer of an assembly task from human to robot. This work extends efforts showing the utility of linguistic models in veriﬁable robot control policies by now performing real visual analysis of human demonstrations to automatically extract a policy for the task. This method tokenizes each human demonstration into a […]
EAGER: Linguistic Task Transfer for Humans and Cyber Systems (Mike Stillman, Irfan Essa) NSF/RI This project, investigating formal languages as a general methodology for task transfer between distinct cyber-physical systems such as humans and robots, aims to expand the science of cyber physical systems by developing Motion Grammars that will enable task transfer between distinct […]
II-New: Motion Grammar Laboratory (Stillman, Essa, Egerstadt, Christensen, Ueda) Division of Computer and Network Systems Instrumentation Grant. An anthropomorphic robot arm and a human capture system enable the autonomous performance of assembly tasks with significant uncertainty in problem specifications and environments. This line of work is investigated through sequences of manipulation actions where the guarantee of the completion […]
The goal of this workshop is to bring together experts from the fields of computer vision and robotics that are working on humanoid robots with vision as one of the primary modalities. Topics of interest include and are not limited to: I am co-organizing the First IEEE Workshop on Computer Vision for Humanoids in conjunction […]