We introduce and tackle the problem of zero-shot object detection (ZSD), which aims to detect object classes which are not observed during training. We work with a challenging set of object classes, not restricting ourselves to similar and/or fine-grained categories as in prior works on zero-shot classification. We present a principled approach by first adapting visual-semantic embeddings for ZSD. We then discuss the problems associated with selecting a background class and motivate two background-aware approaches for learning robust detectors. One of these models uses a fixed background class and the other is based on iterative latent assignments. We also outline the challenge associated with using a limited number of training classes and propose a solution based on dense sampling of the semantic label space using auxiliary data with a large number of categories. We propose novel splits of two standard detection datasets -MSCOCO and VisualGenome, and present extensive empirical results in both the traditional and generalized zero-shot settings to highlight the benefits of the proposed methods. We provide useful insights into the algorithm and conclude by posing some open questions to encourage further research.
The Second Emotion Recognition In The Wild Challenge (EmotiW) 2014 consists of an audio-video based emotion classification challenge, which mimics the real-world conditions. Traditionally, emotion recognition has been performed on data captured in constrained lab-controlled like environment. While this data was a good starting point, such lab controlled data poorly represents the environment and conditions faced in real-world situations. With the exponential increase in the number of video clips being uploaded online, it is worthwhile to explore the performance of emotion recognition methods that work 'in the wild'. The goal of this Grand Challenge is to carry forward the common platform defined during EmotiW 2013, for evaluation of emotion recognition methods in real-world conditions. The database in the 2014 challenge is the Acted Facial Expression In Wild (AFEW) 4.0, which has been collected from movies showing close-to-real-world conditions. The paper describes the data partitions, the baseline method and the experimental protocol.
BACKGROUND: Current pain assessment methods in youth are suboptimal and vulnerable to bias and underrecognition of clinical pain. Facial expressions are a sensitive, specific biomarker of the presence and severity of pain, and computer vision (CV) and machine-learning (ML) techniques enable reliable, valid measurement of pain-related facial expressions from video. We developed and evaluated a CVML approach to measure pain-related facial expressions for automated pain assessment in youth. METHODS:A CVML-based model for assessment of pediatric postoperative pain was developed from videos of 50 neurotypical youth 5 to 18 years old in both endogenous/ongoing and exogenous/transient pain conditions after laparoscopic appendectomy. Model accuracy was assessed for self-reported pain ratings in children and time since surgery, and compared with by-proxy parent and nurse estimates of observed pain in youth.RESULTS: Model detection of pain versus no-pain demonstrated good-to-excellent accuracy (Area under the receiver operating characteristic curve 0.84-0.94) in both ongoing and transient pain conditions. Model detection of pain severity demonstrated moderate-to-strong correlations (r = 0.65-0.86 within; r = 0.47-0.61 across subjects) for both pain conditions. The model performed equivalently to nurses but not as well as parents in detecting pain versus no-pain conditions, but performed equivalently to parents in estimating pain severity. Nurses were more likely than the model to underestimate youth self-reported pain ratings. Demographic factors did not affect model performance.CONCLUSIONS: CVML pain assessment models derived from automatic facial expression measurements demonstrated good-to-excellent accuracy in binary pain classifications, strong correlations with patient self-reported pain ratings, and parent-equivalent estimation of children's pain levels over typical pain trajectories in youth after appendectomy. WHAT'S KNOWN ON THIS SUBJECT:Clinical pain assessment methods in youth are vulnerable to underestimation bias and underrecognition. Facial expressions are sensitive, specific biomarkers of the presence and severity of pain. Computer vision-based pattern recognition enables measurement of painrelated facial expressions from video. WHAT THIS STUDY ADDS:This study demonstrates initial validity for developing computer vision algorithms for automated pain assessment in children. The system developed and tested in this study could provide standardized, continuous, and valid patient monitoring that is potentially scalable. Mr Sikka performed the machine learning under the guidance of Dr Bartlett, drafted the initial manuscript, and reviewed and revised the manuscript; Mr Ahmed carried out a portion of the initial analyses and reviewed and revised the manuscript; Dr Diaz performed data collection, performed a portion of the initial analyses, and reviewed and revised the manuscript; Drs Craig and Goodwin reviewed all analyses, and critically reviewed and revised the manuscript; Drs Bartlett and Huang concep...
We propose a novel method for temporally pooling frames in a video for the task of human action recognition. The method is motivated by the observation that there are only a small number of frames which, together, contain sufficient information to discriminate an action class present in a video, from the rest. The proposed method learns to pool such discriminative and informative frames, while discarding a majority of the non-informative frames in a single temporal scan of the video. Our algorithm does so by continuously predicting the discriminative importance of each video frame and subsequently pooling them in a deep learning framework. We show the effectiveness of our proposed pooling method on standard benchmarks where it consistently improves on baseline pooling methods, with both RGB and optical flow based Convolutional networks. Further, in combination with complementary video representations, we show results that are competitive with respect to the state-of-the-art results on two challenging and publicly available benchmark datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.