Improving Filling Level Classification with Adversarial Training

Modas, Apostolos; Xompero, Alessio; Sánchez-Matilla, Ricardo; Frossard, Pascal; Cavallaro, Andrea

doi:10.1109/icip42928.2021.9506112

Cited by 15 publications

(28 citation statements)

References 21 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Object mass estimation requires the reasoning on different physical properties, especially when the object is a container. Existing perception algorithms use uni-modal or multi-modal data, such as audio, images, and videos, to classify the content type and level as well as the container capacity [4]- [7]. Convolutional neural networks can be trained to classify the content level within a range of containers from a single image when hand occlusions are present [7].…”

Section: Related Workmentioning

confidence: 99%

“…Existing perception algorithms use uni-modal or multi-modal data, such as audio, images, and videos, to classify the content type and level as well as the container capacity [4]- [7]. Convolutional neural networks can be trained to classify the content level within a range of containers from a single image when hand occlusions are present [7]. While the performance is limited by the uni-modal input, the choice of training strategy, e.g.…”

Section: Related Workmentioning

confidence: 99%

“…While the performance is limited by the uni-modal input, the choice of training strategy, e.g. combining adversarial training and transfer learning, can improve the classification accuracy [7]. Independent classification of content type and level can be achieved by using convolutional and recurrent neural networks with only audio as input data [4] or through late fusion of the predictions from both audio and visual features [5].…”

Section: Related Workmentioning

confidence: 99%

“…Similar to [7], we devise a convolutional neural networkbased classifier that predicts the content type and level for each frame and each view. In addition to the feasible combinations as classes, we also include unknown as an extra class to handle opaque or translucent containers for which the content type and level cannot be estimated.…”

Section: B Perceptionmentioning

confidence: 99%

“…or estimate the capacity and mass of the container (empty or filled), using audio or visual data [3]- [7]. Reasoning about the human dynamics from visual data can also provide information about the physical properties of a container [8].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Towards safe human-to-robot handovers of unknown containers

Pang¹,

Xompero²,

Oh³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Safe human-to-robot handovers of unknown objects require accurate estimation of hand poses and object properties, such as shape, trajectory, and weight. Accurately estimating these properties requires the use of scanned 3D object models or expensive equipment, such as motion capture systems and markers, or both. However, testing handover algorithms with robots may be dangerous for the human and, when the object is an open container with liquids, for the robot. In this paper, we propose a real-to-simulation framework to develop safe human-to-robot handovers with estimations of the physical properties of unknown cups or drinking glasses and estimations of the human hands from videos of a human manipulating the container. We complete the handover in simulation, and we estimate a region that is not occluded by the hand of the human holding the container. We also quantify the safeness of the human and object in simulation. We validate the framework using public recordings of containers manipulated before a handover and show the safeness of the handover when using noisy estimates from a range of perceptual algorithms.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: B Perceptionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Towards safe human-to-robot handovers of unknown containers

Pang¹,

Xompero²,

Oh³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

The CORSMAL Benchmark for the Prediction of the Properties of Containers

et al. 2022

Self Cite

View full text Add to dashboard Cite

The contactless estimation of the weight of a container and the amount of its content manipulated by a person are key pre-requisites for safe human-to-robot handovers. However, opaqueness and transparencies of the container and the content, and variability of materials, shapes, and sizes, make this problem challenging. In this paper, we present a range of methods and an open framework to benchmark acoustic and visual perception for the estimation of the capacity of a container, and the type, mass, and amount of its content. The framework includes a dataset, specific tasks and performance measures. We conduct a fair and in-depth comparative analysis of methods that used this framework and audio-only or vision-only baselines designed from related works. Based on this analysis, we can conclude that audioonly and audio-visual classifiers are suitable for the estimation of the type and amount of the content using different types of convolutional neural networks, combined with either recurrent neural networks or a majority voting strategy, whereas computer vision methods are suitable to determine the capacity of the container using regression and geometric approaches. Classifying the content type and level using only audio achieves a weighted average F1-score up to 81% and 97%, respectively. Estimating the container capacity with vision-only approaches and filling mass with audio-visual approaches, multi-stage algorithms reaches up to 65% weighted average capacity and mass scores. These results show that there is still room of improvement for the design of future methods that will be ranked and compared on the individual leaderboards provided by our open framework.

show abstract