Learning Object Detectors With Semi-Annotated Weak Labels

Zhang, Dingwen; Han, Junwei; Guo, Guangyu; Zhao, Long

doi:10.1109/tcsvt.2018.2884173

Cited by 23 publications

(5 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus the proposed method outperforms the generic few-shot models. Based on our analysis, the TOAN framework can also be extended to other weakly-supervised tasks, such as objection detection [70], [71], localization [72], [73], and segmentation [74], etc., where only the image-level supervision is available. The intra-class and inter-class variances in these tasks can be modeled by the proposed target-oriented matching mechanism and global pair-wise bilinear pooling operation, respectively.…”

Section: Generic Few-shot Learningmentioning

confidence: 99%

TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled Samples

Huang

Zhang

et al. 2022

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

In this paper, we study the fine-grained categorization problem under the few-shot setting, i.e., each fine-grained class only contains a few labeled examples, termed Fine-Grained Few-Shot classification (FGFS). The core predicament in FGFS is the high intra-class variance yet low inter-class fluctuations in the dataset. In traditional fine-grained classification, the high intra-class variance can be somewhat relieved by conducting the supervised training on the abundant labeled samples. However, with few labeled examples, it is hard for the FGFS model to learn a robust class representation with the significantly higher intra-class variance. Moreover, the inter-and intra-class variance are closely related. The significant intra-class variance in FGFS often aggravates the low inter-class variance issue.To address the above challenges, we propose a Target-Oriented Alignment Network (TOAN) to tackle the FGFS problem from both intra-and inter-class perspective. To reduce the intra-class variance, we propose a target-oriented matching mechanism to reformulate the spatial features of each support image to match the query ones in the embedding space. To enhance the inter-class discrimination, we devise discriminative fine-grained features by integrating local compositional concept representations with the global second-order pooling. We conducted extensive experiments on four public datasets for fine-grained categorization, and the results show the proposed TOAN obtains the state-of-the-art.

show abstract

Section: Generic Few-shot Learningmentioning

confidence: 99%

TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled Samples

Huang

Zhang

et al. 2022

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

show abstract

“…Our first limitation is that the targeted domain for reviewed papers in this survey is comparatively narrow, i.e., surveillance BVD. A broader version can be the complete video analytics domain, covering action and activity recognition [60], video summarization, [61], video retrieval [7], healthcare [62], objects detection and tracking [63,64], etc. The specific focus of our research is to conduct an in-depth review of fuzzy methods applied to the generic video analysis domain, towards deriving a proper taxonomy of the applied fuzzy techniques.…”

Section: Representative Surveys In Fuzzy Logic and Our Surveymentioning

confidence: 99%

Fuzzy Logic in Surveillance Big Video Data Analysis

et al. 2021

View full text Add to dashboard Cite

CCTV cameras installed for continuous surveillance generate enormous amounts of data daily, forging the term Big Video Data (BVD). The active practice of BVD includes intelligent surveillance and activity recognition, among other challenging tasks. To efficiently address these tasks, the computer vision research community has provided monitoring systems, activity recognition methods, and many other computationally complex solutions for the purposeful usage of BVD. Unfortunately, the limited capabilities of these methods, higher computational complexity, and stringent installation requirements hinder their practical implementation in real-world scenarios, which still demand human operators sitting in front of cameras to monitor activities or make actionable decisions based on BVD. The usage of human-like logic, known as fuzzy logic, has been employed emerging for various data science applications such as control systems, image processing, decision making, routing, and advanced safety-critical systems. This is due to its ability to handle various sources of real-world domain and data uncertainties, generating easily adaptable and explainable data-based models. Fuzzy logic can be effectively used for surveillance as a complementary for huge-sized artificial intelligence models and tiresome training procedures. In this article, we draw researchers’ attention toward the usage of fuzzy logic for surveillance in the context of BVD. We carry out a comprehensive literature survey of methods for vision sensory data analytics that resort to fuzzy logic concepts. Our overview highlights the advantages, downsides, and challenges in existing video analysis methods based on fuzzy logic for surveillance applications. We enumerate and discuss the datasets used by these methods, and finally provide an outlook toward future research directions derived from our critical assessment of the efforts invested so far in this exciting field.

show abstract

“…In order to accurately locate different keypoints of the targets, most deep models take multi-scale or high-resolution information into account [21], [26], [28], whilst looking at contextual information. In general, contextual information is referred to as regions surrounding the targets, and it has been proved effective in pose estimation [26], [34], object detection [37], [38], co-saliency detection [39]. However, these models ignore the difference between different keypoints to some extent, which are significant to mouse pose estimation due to relatively weak spatial correlation caused by highly deformable mouse body.…”

Section: A Structured Context Mixermentioning

confidence: 99%

Structured Context Enhancement Network for Mouse Pose Estimation

Zhou

Jiang

Liu

et al. 2020

Preprint

View full text Add to dashboard Cite

Automated analysis of mouse behaviours is crucial for many applications in neuroscience. However, quantifying mouse behaviours from videos or images remains a challenging problem, where pose estimation plays an important role in describing mouse behaviours. Although deep learning based methods have made promising advances in mouse or other animal pose estimation, they cannot properly handle complicated scenarios (e.g., occlusions, invisible keypoints, and abnormal poses). Particularly, since mouse body is highly deformable, it is a big challenge to accurately locate different keypoints on the mouse body. In this paper, we propose a novel hourglass network based model, namely Graphical Model based Structured Context Enhancement Network (GM-SCENet) where two effective modules, i.e., Structured Context Mixer (SCM) and Cascaded Multi-Level Supervision module (CMLS) are designed. The SCM can adaptively learn and enhance the proposed structured context information of each mouse part by a novel graphical model with close consideration on the difference between body parts. Then, the CMLS module is designed to jointly train the proposed SCM and the hourglass network by generating multi-level information, which increases the robustness of the whole network. Based on the multi-level predictions from the SCM and the CMLS module, we also propose an inference method to enhance the localization results. Finally, we evaluate our proposed approach against several baselines on our Parkinson's Disease Mouse Behaviour (PDMB) and the standard DeepLabCut Mouse Pose datasets, where the results show that our method can achieve better or competitive performance against the other state-of-theart approaches.

show abstract

Learning Object Detectors With Semi-Annotated Weak Labels

Cited by 23 publications

References 41 publications

TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled Samples

TOAN: Target-Oriented Alignment Network for Fine-Grained Image Categorization With Few Labeled Samples

Fuzzy Logic in Surveillance Big Video Data Analysis

Structured Context Enhancement Network for Mouse Pose Estimation

Contact Info

Product

Resources

About