End-to-End Face Detection and Cast Grouping in Movies Using Erdös-Rényi Clustering

Jin, SouYoung; Su, Hang; Stauffer, Chris; Learned-Miller, Erik

doi:10.1109/iccv.2017.564

Cited by 37 publications

(29 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To link multiple object detections across video frames into temporally consistent tracklets, we use the algorithm from Jin et al (Sec. 3 of [26]) with the MD-Net tracker [38]. Now, given a tracklet that consistently follows an object through a video sequence, when the object detector did not fire (i.e.…”

Section: Automatic Labeling Of the Target Domainmentioning

confidence: 99%

Automatic Adaptation of Object Detectors to New Domains Using Self-Training

RoyChowdhury

Chakrabarty

Singh

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

139

View full text Add to dashboard Cite

This work addresses the unsupervised adaptation of an existing object detector to a new target domain. We assume that a large number of unlabeled videos from this domain are readily available. We automatically obtain labels on the target data by using high-confidence detections from the existing detector, augmented with hard (misclassified) examples acquired by exploiting temporal cues using a tracker. These automatically-obtained labels are then used for re-training the original model. A modified knowledge distillation loss is proposed, and we investigate several ways of assigning soft-labels to the training examples from the target domain. Our approach is empirically evaluated on challenging face and pedestrian detection tasks: a face detector trained on WIDER-Face, which consists of highquality images crawled from the web, is adapted to a largescale surveillance data set; a pedestrian detector trained on clear, daytime images from the BDD-100K driving data set is adapted to all other scenarios such as rainy, foggy, nighttime. Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters. Code and models are available at

show abstract

Section: Automatic Labeling Of the Target Domainmentioning

confidence: 99%

Automatic Adaptation of Object Detectors to New Domains Using Self-Training

RoyChowdhury

Chakrabarty

Singh

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

139

View full text Add to dashboard Cite

show abstract

“…Verification losses. Next, we analyze LDML, contrastive, and triplet losses ( Table 5 rows [10][11][12][13][14][15][16][17][18]. While these losses are often used to perform clustering, they are not designed for it [47].…”

Section: #Chmentioning

confidence: 99%

Video Face Clustering With Unknown Number of Clusters

Tapaswi

Law²,

Fidler

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

Understanding videos such as TV series and movies requires analyzing who the characters are and what they are doing. We address the challenging problem of clustering face tracks based on their identity. Different from previous work in this area, we choose to operate in a realistic and difficult setting where: (i) the number of characters is not known a priori; and (ii) face tracks belonging to minor or background characters are not discarded.To this end, we propose Ball Cluster Learning (BCL), a supervised approach to carve the embedding space into balls of equal size, one for each cluster. The learned ball radius is easily translated to a stopping criterion for iterative merging algorithms. This gives BCL the ability to estimate the number of clusters as well as their assignment, achieving promising results on commonly used datasets. We also present a thorough discussion of how existing metric learning literature can be adapted for this task. 1 We consider three types of characters based on their roles: primary or recurring characters have major roles in several episodes; secondary or minor characters are named and play an important role in some episodes; and background or unknown (Unk) characters are unnamed and uncredited.

show abstract

“…3. We generate tracklets using the method from [26] and show results incorporating hard positives on pedestrian and face detection in the experiments section. The manually calculated purity over 300 randomly sampled frames was 94.46% for faces and 83.13% for pedestrians.…”

Section: Extension To Hard Positive Miningmentioning

confidence: 99%

Unsupervised Hard Example Mining from Videos for Improved Object Detection

Jin

RoyChowdhury

Jiang

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Important gains have recently been obtained in object detection by using training objectives that focus on hard negative examples, i.e., negative examples that are currently rated as positive or ambiguous by the detector. These examples can strongly influence parameters when the network is trained to correct them. Unfortunately, they are often sparse in the training data, and are expensive to obtain. In this work, we show how large numbers of hard negatives can be obtained automatically by analyzing the output of a trained detector on video sequences. In particular, detections that are isolated in time, i.e., that have no associated preceding or following detections, are likely to be hard negatives. We describe simple procedures for mining large numbers of such hard negatives (and also hard positives) from unlabeled video data. Our experiments show that retraining detectors on these automatically obtained examples often significantly improves performance. We present experiments on multiple architectures and multiple data sets, including face detection, pedestrian detection and other object categories.

show abstract

End-to-End Face Detection and Cast Grouping in Movies Using Erdös-Rényi Clustering

Cited by 37 publications

References 38 publications

Automatic Adaptation of Object Detectors to New Domains Using Self-Training

Automatic Adaptation of Object Detectors to New Domains Using Self-Training

Video Face Clustering With Unknown Number of Clusters

Unsupervised Hard Example Mining from Videos for Improved Object Detection

Contact Info

Product

Resources

About