2023
DOI: 10.48550/arxiv.2303.08131
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Simple Framework for Open-Vocabulary Segmentation and Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 47 publications
(83 reference statements)
0
5
0
Order By: Relevance
“…Zero-shot learning is a way to generalize on unseen labels, without having specifically trained to classify them. We compare SAM with zero-shot segmentation models from the state of the art, OpenSeed [60,56] and CLIPSeg [37], using the mean Intersection-Over-Union (mIoU) metric. The IoU is defined as the area of overlap between the predicted segmentation and the ground truth divided by the area of union between the predicted segmentation and the ground truth.…”
Section: Segmentation Using Sammentioning
confidence: 99%
“…Zero-shot learning is a way to generalize on unseen labels, without having specifically trained to classify them. We compare SAM with zero-shot segmentation models from the state of the art, OpenSeed [60,56] and CLIPSeg [37], using the mean Intersection-Over-Union (mIoU) metric. The IoU is defined as the area of overlap between the predicted segmentation and the ground truth divided by the area of union between the predicted segmentation and the ground truth.…”
Section: Segmentation Using Sammentioning
confidence: 99%
“…Examples of such neural networks are the classic Mask R-CNN model [34], the fast YOLACT-EDGE model [35], which takes into account a sequence of images, and the most recent YOLOv8 model [36]. Existing transformer models for object segmentation with a fixed number of objects, such as OneFormer [37], or models with an open vocabulary, such as OpenSeed [38], are not considered due to their limited computational efficiency. Semantic segmentation methods are also not considered, as they do not separate closely located objects of the same class.…”
Section: ) Neural Network-based Object Segmentationmentioning
confidence: 99%
“…Improved depth estimation of pedestrians and vehicles through a LiDAR data-like data representation Pseudo-LiDAR is studied in [238,239] Finally, 3D reconstruction methods have developed greatly with more learning integrated into the 3D reconstruction pipeline [240][241][242][243][244], from learned monocular depthmaps [245], to learned 3D reconstruction features [246] to 3d matching [247] and the visually pleasing NERF-based methods [248,249]. Semantic segmentation, tracking, and object detection methods are also becoming less supervised utilizing learned matching and language model based labels [144,[250][251][252][253][254][255][256][257]. Combining different visual tasks like object detection semantics segmentation, tracking with 3D modeling has seen success in [144,258].…”
Section: Developments In the Fieldmentioning
confidence: 99%