2021
DOI: 10.1007/978-3-030-68793-9_31
|View full text |Cite
|
Sign up to set email alerts
|

Top-1 CORSMAL Challenge 2020 Submission: Filling Mass Estimation Using Multi-modal Observations of Human-Robot Handovers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
29
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 12 publications
(29 citation statements)
references
References 26 publications
0
29
0
Order By: Relevance
“…While each property (content type, content level) could be classified independently [10]- [12], the combination of the two predictions can result in a wrong classification, if either is incorrect. We thus define a set of seven classes that combine content types and levels, C = {empty, pasta-half-full, pasta-full, rice-half-full, rice-full, water-half-full, water-full} (see Tab.…”
Section: Proposed Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…While each property (content type, content level) could be classified independently [10]- [12], the combination of the two predictions can result in a wrong classification, if either is incorrect. We thus define a set of seven classes that combine content types and levels, C = {empty, pasta-half-full, pasta-full, rice-half-full, rice-full, water-half-full, water-full} (see Tab.…”
Section: Proposed Methodsmentioning
confidence: 99%
“…We compare ACC with 9 alternative approaches, namely ResNet (18 layers) [16], a shallower ResNet variant (14 layers), a ResNet (18 layers) pre-trained on ImageNet (ResNet-18 ) [17] and fine-tuned on the training split of CCM, VGG (11 layers) [14], Support Vector Machine (SVM) [18], Random Forest [19], K-Nearest Neighbours (kNN) [20], and the top-2 submissions of the 2020 CORSMAL Challenge 1 , namely Because It's Tactile (BIT) [10], and HVRL [11]. SVM, kNN, Random Forest, VGG, and ResNet-based classifiers perform direct classification as a single model.…”
Section: A Methods Under Comparisonmentioning
confidence: 99%
See 1 more Smart Citation
“…combining adversarial training and transfer learning, can improve the classification accuracy [7]. Independent classification of content type and level can be achieved by using convolutional and recurrent neural networks with only audio as input data [4] or through late fusion of the predictions from both audio and visual features [5]. Alternatively, multiple multi-layer perceptrons can be trained with audio data and conditioned on the container category estimated from a majority voting of the object detection across the frames of multi-view sequences [6].…”
Section: Related Workmentioning
confidence: 99%
“…Alternatively, multiple multi-layer perceptrons can be trained with audio data and conditioned on the container category estimated from a majority voting of the object detection across the frames of multi-view sequences [6]. Container capacity can be estimated as an approximation of a reconstructed shape [4], [5], [33]. An iterative approach minimises a 3D primitive to the real object shape by constraining to the object segmentation mask from two views of a widebaseline stereo camera, using both RGB, depth, and infrared images [5].…”
Section: Related Workmentioning
confidence: 99%