The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Hundt, Andrew; Jain, Varun; Lin, Chia-Hung; Paxton, Chris; Hager, Gregory D.

doi:10.1109/iros40897.2019.8967784

Cited by 6 publications

(4 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In essence, the design steps of a neural network architecture that might otherwise be done by an engineer or graduate student by hand are instead automated and optimized as part of a well defined search space of reasonable layers, connections, outputs, and hyperparameters. In fact, architecture search can itself be defined in terms of hyperparameters [12] or as a graph search problem [27,19,2,24]. Furthermore, once a search space is defined various tools can be brought to bear on the problem including Bayesian optimization [16], other neural networks [1], reinforcement learning, evolution [21,20], or a wide variety of optimization frameworks.…”

Section: Related Workmentioning

confidence: 99%

sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Hundt¹,

Jain²,

Hager³

2019

Preprint

Self Cite

View full text Add to dashboard Cite

Neural Architecture Search (NAS) has been a source of dramatic improvements in neural network design, with recent results meeting or exceeding the performance of hand-tuned architectures. However, our understanding of how to represent the search space for neural net architectures and how to search that space efficiently are both still in their infancy.We have performed an in-depth analysis to identify limitations in a widely used search space and a recent architecture search method, Differentiable Architecture Search (DARTS). These findings led us to introduce novel network blocks with a more general, balanced, and consistent design; a better-optimized Cosine Power Annealing learning rate schedule; and other improvements. Our resulting sharpDARTS search is 50% faster with a 20-30% relative improvement in final model error on CIFAR-10 when compared to DARTS. Our best single model run has 1.93% (1.98±0.07) validation error on CIFAR-10 and 5.5% error (5.8±0.3) on the recently released CIFAR-10.1 test set. To our knowledge, both are state of the art for models of similar size. This model also generalizes competitively to ImageNet at 25.1% top-1 (7.8% top-5) error.We found improvements for existing search spaces but does DARTS generalize to new domains? We propose Differentiable Hyperparameter Grid Search and the HyperCuboid search space, which are representations designed to leverage DARTS for more general parameter optimization. Here we find that DARTS fails to generalize when compared against a human's one shot choice of models. We look back to the DARTS and sharpDARTS search spaces to understand why, and an ablation study reveals an unusual generalization gap. We finally propose Max-W regularization to solve this problem, which proves significantly better than the handmade design. Code will be made available.

show abstract

Section: Related Workmentioning

confidence: 99%

sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Hundt¹,

Jain²,

Hager³

2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Finally, neural architecture search forms the basis for our hyperparameter choices [23], [24]. Neural networks are imperfect arbitrary function approximators, so a better choice of algorithm is an effective approach to improving deep learning based robotic manipulation algorithms, as we have detailed in past work [25].…”

Section: Related Workmentioning

confidence: 99%

“…[bn, relu, conv1x1, bn, relu, conv1x1], where a 1x1 convolution is equivalent to a dense layer at each pixel. These parameters are based on the final dense block structure optimized for accuracy via HyperTree Architecture Search [25] in our prior work. We note that efficiency was not considered in the HyperTree metric and as a result this pixelwise dense block accounts for over 50% of the computation in EVT, so it is a good target for future efficiency gains.…”

Section: Action After Successful Grasp: Place (X Y !)mentioning

confidence: 99%

"Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer

Hundt,

Killeen,

Greene

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

In order to learn effectively, robots must be able to extract the intangible context by which task progress and mistakes are defined. In the domain of reinforcement learning, much of this information is provided by the reward function. Hence, reward shaping is a necessary part of how we can achieve state-of-the-art results on complex, multi-step tasks. However, comparatively little work has examined how reward shaping should be done so that it captures task context, particularly in scenarios where the task is long-horizon and failure is highly consequential. Our Schedule for Positive Task (SPOT) reward trains our Efficient Visual Task (EVT) model to solve problems that require an understanding of both task context and workspace constraints of multi-step block arrangement tasks. In simulation EVT can completely clear adversarial arrangements of objects by pushing and grasping in 99% of cases vs an 82% baseline in prior work. For random arrangements EVT clears 100% of test cases at 86% action efficiency vs 61% efficiency in prior work. EVT + SPOT is also able to demonstrate context understanding and complete stacks in 74% of trials compared to a baseline of 5% with EVT alone. To our knowledge, this is the first instance of a Reinforcement Learning based algorithm successfully completing such a challenge. Code is available at https://github.com/jhu-lcsr/good robot.

show abstract

“…Motivated by a desire to enable better human-robot collaboration and finer-grained behavioral analyses, researchers in computer vision and robotics have recently begun to approach the challenging problem of assembly action recognition [1], [2], [3], [4], [5]. In assembly activity recognition, a perception system must recognize both the assembly actions and the configuration of a structure (e.g.…”

Section: Introductionmentioning

confidence: 99%

Fine-grained activity recognition for assembly videos

Jones¹,

Cortesa²,

Shelton³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

In this paper we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires perception at a level of spatial detail that has not been attempted in the action recognition literature to date. We extend the fine-grained activity recognition setting to address the task of assembly action recognition in its full generality by unifying assembly actions and kinematic structures within a single framework. We use this framework to develop a general method for recognizing assembly actions from observation sequences, along with observation features that take advantage of a spatial assembly's special structure. Finally, we evaluate our method empirically on two application-driven data sources:(1) An IKEA furniture-assembly dataset, and (2) A blockbuilding dataset. On the first, our system recognizes assembly actions with an average framewise accuracy of 70% and an average normalized edit distance of 10%. On the second, which requires fine-grained geometric reasoning to distinguish between assemblies, our system attains an average normalized edit distance of 23%-a relative improvement of 69% over prior work.

show abstract

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Cited by 6 publications

References 33 publications

sharpDARTS: Faster and More Accurate Differentiable Architecture Search

sharpDARTS: Faster and More Accurate Differentiable Architecture Search

"Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer

Fine-grained activity recognition for assembly videos

Contact Info

Product

Resources

About