We present a differentiable framework capable of learning a wide variety of compositions of simple policies that we call skills. By recursively composing skills with themselves, we can create hierarchies that display complex behavior. Skill networks are trained to generate skill-state embeddings that are provided as inputs to a trainable composition function, which in turn outputs a policy for the overall task. Our experiments on an environment consisting of multiple collect and evade tasks show that this architecture is able to quickly build complex skills from simpler ones. Furthermore, the learned composition function displays some transfer to unseen combinations of skills, allowing for zero-shot generalizations.
In Automatic Sign Language Recognition (ASLR), robust hand tracking and detection is key to good recognition accuracy. We introduce a new dataset of depth data from continuously signed American Sign Language (ASL) sentences. We present analysis showing numerous errors of the Microsoft Kinect Skeleton Tracker (MKST) in cases where hands are close to the body, close to each other, or when the arms cross. We also propose a method based on domain-driven random forest regression, which predicts real world 3D hand locations using features generated from depth images. We show that our hand detector (DDRFR) has >20% improvement over the MKST within a margin of error of 5 cm from the ground truth.
We address the problem of performing silent speech recognition where vocalized audio is not available (e.g. due to a user's medical condition) or is highly noisy (e.g. during firefighting or combat). We describe our wearable system to capture tongue and jaw movements during silent speech. The system has two components: the Tongue Magnet Interface (TMI), which utilizes the 3-axis magnetometer aboard Google Glass to measure the movement of a small magnet glued to the user's tongue, and the Outer Ear Interface (OEI), which measures the deformation in the ear canal caused by jaw movements using proximity sensors embedded in a set of earmolds. We collected a data set of 1901 utterances of 11 distinct phrases silently mouthed by six able-bodied participants. Recognition relies on using hidden Markov modelbased techniques to select one of the 11 phrases. We present encouraging results for user dependent recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.