In this study, we present extensive attention-based networks with data augmentation methods to participate in the IN-TERSPEECH 2019 ComPareE Challenge, specifically the three Sub-challenges: Styrian Dialect Recognition, Continuous Sleepiness Regression, and Baby Sound Classification. For Styrian Dialect Sub-challenge, these dialects are classified into Northern Styrian (NorthernS), Urban Sytrian (UrbanS), and Eastern Styrian (EasternS). Our proposed model achieves an UAR 49.5% on the test set, which is 2.5% higher than the baseline. For Continuous Sleepiness Sub-challenge, it is defined as a regression task with score range from 1 (extremely alert) to 9 (very sleepy). In this work, our proposed architecture achieves a Spearman correlation 0.369 on the test set, which surpasses the baseline model by 0.026. For Baby Sound Sub-challenge, the infant sounds are classified into canonical babbling, noncanonical babbling, crying, laughing and junk/other, and our proposed augmentation framework achieves an UAR of 62.39% on the test set, which outperforms the baseline by about 3.7%. Overall, our analyses demonstrate that by fusing attention network models with conventional support vector machine benefits the test set robustness, and the recognition rates of these paralinguistic attributes generally improve when performing data augmentation.
In this study, we present a computational framework to participate in the Self-Assessed Affect Sub-Challenge in the INTER-SPEECH 2018 Computation Paralinguistics Challenge. The goal of this sub-challenge is to classify the valence scores given by the speaker themselves into three different levels, i.e., low, medium, and high. We explore fusion of Bi-directional LSTM with baseline SVM models to improve the recognition accuracy. In specifics, we extract frame-level acoustic LLDs as input to the BLSTM with a modified attention mechanism, and separate SVMs are trained using the standard ComParE 16 baseline feature sets with minority class upsampling. These diverse prediction results are then further fused using a decision-level score fusion scheme to integrate all of the developed models. Our proposed approach achieves a 62.94% and 67.04% unweighted average recall (UAR), which is an 6.24% and 1.04% absolute improvement over the best baseline provided by the challenge organizer. We further provide a detailed comparison analysis between different models.
ABSTRACT:A spherical camera can observe the environment for almost 720 degrees' field of view in one shoot, which is useful for augmented reality, environment documentation, or mobile mapping applications. This paper aims to develop a spherical photogrammetry imaging system for the purpose of 3D measurement through a backpacked mobile mapping system (MMS). The used equipment contains a Ladybug-5 spherical camera, a tactical grade positioning and orientation system (POS), i.e. SPAN-CPT, and an odometer, etc. This research aims to directly apply photogrammetric space intersection technique for 3D mapping from a spherical image stereo-pair. For this purpose, several systematic calibration procedures are required, including lens distortion calibration, relative orientation calibration, boresight calibration for direct georeferencing, and spherical image calibration. The lens distortion is serious on the ladybug-5 camera's original 6 images. Meanwhile, for spherical image mosaicking from these original 6 images, we propose the use of their relative orientation and correct their lens distortion at the same time. However, the constructed spherical image still contains systematic error, which will reduce the 3D measurement accuracy. Later for direct georeferencing purpose, we need to establish a ground control field for boresight/lever-arm calibration. Then, we can apply the calibrated parameters to obtain the exterior orientation parameters (EOPs) of all spherical images. In the end, the 3D positioning accuracy after space intersection will be evaluated, including EOPs obtained by structure from motion method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.