Abstract3D pose estimation is a key component of many important computer vision tasks such as autonomous navigation and 3D scene understanding. Most state-of-the-art approaches to 3D pose estimation solve this problem as a pose-classification problem in which the pose space is discretized into bins and a CNN classifier is used to predict a pose bin. We argue that the 3D pose space is continuous and propose to solve the pose estimation problem in a CNN regression framework with a suitable representation, data augmentation and loss function that captures the geometry of the pose space. Experiments on PASCAL3D+ show that the proposed 3D pose regression approach achieves competitive performance compared to the state-of-the-art.
Key Points
Question
Are deep learning techniques sufficiently accurate to classify presegmented phases in videos of cataract surgery for subsequent automated skill assessment and feedback?
Findings
In this cross-sectional study including videos from a convenience sample of 100 cataract procedures, modeling time series of labels of instruments in use appeared to yield greater accuracy in classifying phases of cataract operations than modeling cross-sectional data on instrument labels, spatial video image features, spatiotemporal video image features, or spatiotemporal video image features with appended instrument labels.
Meaning
Time series models of instruments in use may serve to automate the identification of phases in cataract surgery, helping to develop efficient and effective surgical skill training tools in ophthalmology.
Abstract-The recognition of actions from video sequences has many applications in health monitoring, assisted living, surveillance, and smart homes. Despite advances in sensing, in particular related to 3D video, the methodologies to process the data are still subject to research. We demonstrate superior results by a system which combines recurrent neural networks with convolutional neural networks in a voting approach. The gated-recurrent-unit-based neural networks are particularly well-suited to distinguish actions based on long-term information from optical tracking data; the 3D-CNNs focus more on detailed, recent information from video data. The resulting features are merged in an SVM which then classifies the movement. In this architecture, our method improves recognition rates of state-of-the-art methods by 14% on standard data sets.
Automated segmentation of fine objects details in a given image is becoming of crucial interest in different imaging fields. In this paper, we propose a new variational level-set model for both global and interactive\selective segmentation tasks, which can deal with intensity inhomogeneity and the presence of noise. The proposed method maintains the same performance on clean and noisy vector-valued images. The model utilizes a combination of locally computed denoising constrained surface and a denoising fidelity term to ensure a fine segmentation of local and global features of a given image. A two-phase level-set formulation has been extended to a multi-phase formulation to successfully segment medical images of the human brain. Comparative experiments with state-of-the-art models show the advantages of the proposed method.
Most image segmentation techniques efficiently segment images with prominent edges, but are less efficient for some images with low frequencies and overlapping regions of homogeneous intensities. A recently proposed selective segmentation model often works well, but not for such challenging images. In this paper, we introduce a new model using the coefficient of variation as a fidelity term, and our test results show it performs much better in these challenging cases.
Abstract-A large number of deaths are caused by Traffic accidents worldwide. The global crisis of road safety can be seen by observing the significant number of deaths and injuries that are caused by road traffic accidents. In many situations the family members or emergency services are not informed in time. This results in delayed emergency service response time, which can lead to an individual's death or cause severe injury. The purpose of this work is to reduce the response time of emergency services in situations like traffic accidents or other emergencies such as fire, theft/robberies and medical emergencies. By utilizing onboard sensors of a smartphone to detect vehicular accidents and report it to the nearest emergency responder available and provide real time location tracking for responders and emergency victims, will drastically increase the chances of survival for emergency victims, and also help save emergency services time and resources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.