2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2017
DOI: 10.1109/iros.2017.8206290
|View full text |Cite
|
Sign up to set email alerts
|

Belief tree search for active object recognition

Abstract: Active Object Recognition (AOR) has been approached as an unsupervised learning problem, in which optimal trajectories for object inspection are not known and are to be discovered by reducing label uncertainty measures or training with reinforcement learning. Such approaches have no guarantees of the quality of their solution. In this paper, we treat AOR as a Partially Observable Markov Decision Process (POMDP) and find near-optimal policies on training data using Belief Tree Search (BTS) on the corresponding … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 25 publications
(56 reference statements)
0
8
0
Order By: Relevance
“…In this subsection, several baselines [ 10 ] and state-of-the-art VP approaches [ 11 , 12 , 16 ] are employed for experiment comparison with our continuous VP method, which are showed as follows: Random policy [ 10 ] plans a random action from the action space with uniform probability; Sequential policy [ 10 ] moves the agent to the next immediate position in the same direction; DQL policy [ 11 , 12 ] exploits deep Q-Learning algorithm to learn a discrete VP policy. The discrete action space is ; E-TRPO policy [ 16 ] develops a continuous VP method which is implemented by trust region policy optimization [ 17 ] and extreme learning machine [ 18 ].…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…In this subsection, several baselines [ 10 ] and state-of-the-art VP approaches [ 11 , 12 , 16 ] are employed for experiment comparison with our continuous VP method, which are showed as follows: Random policy [ 10 ] plans a random action from the action space with uniform probability; Sequential policy [ 10 ] moves the agent to the next immediate position in the same direction; DQL policy [ 11 , 12 ] exploits deep Q-Learning algorithm to learn a discrete VP policy. The discrete action space is ; E-TRPO policy [ 16 ] develops a continuous VP method which is implemented by trust region policy optimization [ 17 ] and extreme learning machine [ 18 ].…”
Section: Methodsmentioning
confidence: 99%
“…DQL policy [ 11 , 12 ] exploits deep Q-Learning algorithm to learn a discrete VP policy. The discrete action space is ;…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations