2021
DOI: 10.1007/s00521-021-06588-1
|View full text |Cite
|
Sign up to set email alerts
|

Multi-view dual attention network for 3D object recognition

Abstract: The existing view-based 3D object classification and recognition methods ignore the inherent hierarchical correlation and distinguishability of views, making it difficult to further improve the classification accuracy. In order to solve this problem, this paper proposes an end-to-end multi-view dual attention network framework for high-precision recognition of 3D objects. On one hand, we obtain three feature layers of query, key, and value through the convolution layer. The spatial attention matrix is generate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
18
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(18 citation statements)
references
References 39 publications
(94 reference statements)
0
18
0
Order By: Relevance
“…All the above methods only model local pose dynamics, ignoring global body translation and inter-individual body interaction. However, learning both local and global pose dynamics and modeling fine-grained human-human interaction are essential for comprehending human behavior in a complex 3D environment [2,39].…”
Section: Single-person Pose Forecastingmentioning
confidence: 99%
See 3 more Smart Citations
“…All the above methods only model local pose dynamics, ignoring global body translation and inter-individual body interaction. However, learning both local and global pose dynamics and modeling fine-grained human-human interaction are essential for comprehending human behavior in a complex 3D environment [2,39].…”
Section: Single-person Pose Forecastingmentioning
confidence: 99%
“…Guo et al [15] present a collaborative prediction task and use a two-branch attention network for the prediction of two interacted persons. Wang et al [39] present a Transformer-based framework to forecast multi-person motion in a scenario with more people. Furthermore, this method produces unrealistic poses since they solely concen-Figure 2.…”
Section: Multi-person Pose Forecastingmentioning
confidence: 99%
See 2 more Smart Citations
“…An attention mechanism adaptively weighs the keys of different key-value pairs based on their relative importance to a given query to predict the most suitable responses to the query [45]. Depending on the data paradigm of the key, the value, and the query, attention mechanisms are used in a wide variety of tasks, including tasks in natural language understanding [9], text-based image and video retrieval [4], object and action recognition in images and videos [46,40], and visual question answering [57]. In the case of userspecific highlight detection, the key, value, and query need to be based on the video contents, i.e., follow the paradigm of content-based highlight detection [42,37,2] to perform meaningful retrieval of the highlightable clips per user.…”
Section: Introductionmentioning
confidence: 99%