Prof-Life-Log: Analysis and classification of activities in daily audio streams

Ziaei, Ali; Sangwan, Abhijeet; Kaushik, Lakshmish; Hansen, John H. L.

doi:10.1109/icassp.2015.7178866

Cited by 13 publications

(6 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors of [ 18 ] created a method to analyze and classify daily activities in personal audio recordings (PARs). The method applies: speech activity detection (SAD), speaker diarization systems, and computing the number of audio speech and lexical features [ 18 ]. It uses a TO-Combo-SAD (Threshold Optimized Combo SAD) algorithm for separating speech from noise [ 18 ].…”

Section: Resultsmentioning

confidence: 99%

“…The method applies: speech activity detection (SAD), speaker diarization systems, and computing the number of audio speech and lexical features [ 18 ]. It uses a TO-Combo-SAD (Threshold Optimized Combo SAD) algorithm for separating speech from noise [ 18 ]. The Principal Component Analysis (PCA) is first applied for dimensionality reduction, and then, the remaining features are supplied to a multi-class support vector machine (SVM) with radial basis function (RBF) kernel for model training and evaluation [ 18 ].…”

Section: Resultsmentioning

confidence: 99%

“…It uses a TO-Combo-SAD (Threshold Optimized Combo SAD) algorithm for separating speech from noise [ 18 ]. The Principal Component Analysis (PCA) is first applied for dimensionality reduction, and then, the remaining features are supplied to a multi-class support vector machine (SVM) with radial basis function (RBF) kernel for model training and evaluation [ 18 ]. The authors performed recognition of faculty meeting, research meeting, staff meeting, alone time, and conference call, reporting accuracies between 62.78% and 84.25% [ 18 ].…”

Section: Resultsmentioning

confidence: 99%

See 2 more Smart Citations

Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review

Pires

Santos

Pombo

et al. 2018

Sensors

View full text Add to dashboard Cite

An increase in the accuracy of identification of Activities of Daily Living (ADL) is very important for different goals of Enhanced Living Environments and for Ambient Assisted Living (AAL) tasks. This increase may be achieved through identification of the surrounding environment. Although this is usually used to identify the location, ADL recognition can be improved with the identification of the sound in that particular environment. This paper reviews audio fingerprinting techniques that can be used with the acoustic data acquired from mobile devices. A comprehensive literature search was conducted in order to identify relevant English language works aimed at the identification of the environment of ADLs using data acquired with mobile devices, published between 2002 and 2017. In total, 40 studies were analyzed and selected from 115 citations. The results highlight several audio fingerprinting techniques, including Modified discrete cosine transform (MDCT), Mel-frequency cepstrum coefficients (MFCC), Principal Component Analysis (PCA), Fast Fourier Transform (FFT), Gaussian mixture models (GMM), likelihood estimation, logarithmic moduled complex lapped transform (LMCLT), support vector machine (SVM), constant Q transform (CQT), symmetric pairwise boosting (SPB), Philips robust hash (PRH), linear discriminant analysis (LDA) and discrete cosine transform (DCT).

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review

Pires

Santos

Pombo

et al. 2018

Sensors

View full text Add to dashboard Cite

show abstract

“…Automatic word count estimation (WCE) from audio recordings can be used to investigate vocal activity and social interaction as a function of recording time and location, such as in personal life logs derived from wearable sensors (Ziaei et al, 2015;Ziaei et al, 2016). WCE is also a highly useful tool in the scientific study of child language acquisition because it can help answer questions such as how much speech children hear in their daily lives in different contexts (e.g., Bergelson et al, 2018a), and how the language input maps to later developmental outcomes in the same children (Weisleder & Fernald, 2013;Ramírez-Esparza et al, 2014).…”

Section: Introductionmentioning

confidence: 99%

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

Räsänen¹,

Seshadri²,

karadayi³

et al. 2019

Preprint

View full text Add to dashboard Cite

Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of- the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE.

show abstract

“…For instance, syllable-based speaking rate estimation algorithms (such as [1], [2]) can be used to analyze prosodic patterns of speakers and speaking styles for linguistic research, or used as additional information for training text-to-speech (TTS) synthesis systems. Syllables are also used for automatic estimation of vocal activity and social interaction from long and noisy audio recordings captured by wearable microphones, as in the personal life log application of Ziaei et al [3] [4]. There is also a need for robust language-independent methods for quantifying the amount of speech in daylong child-centered audio recordings from various language environments [5], [6], as child language researchers use such data to understand language development in children in (e.g., [7], [8]).…”

Section: Introductionmentioning

confidence: 99%

SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

Seshadri

Räsänen

2019

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Automatic syllable count estimation (SCE) is used in a variety of applications ranging from speaking rate estimation to detecting social activity from wearable microphones or developmental research concerned with quantifying speech heard by language-learning children in different environments. The majority of previously utilized SCE methods have relied on heuristic DSP methods, and only a small number of bi-directional long short-term memory (BLSTM) approaches have made use of modern machine learning approaches in the SCE task. This paper presents a novel end-to-end method called SylNet for automatic syllable counting from speech, built on the basis of a recent developments in neural network architectures. We describe how the entire model can be optimized directly to minimize SCE error on the training data without annotations aligned at the syllable level, and how it can be adapted to new languages using limited speech data with known syllable counts. Experiments on several different languages reveal that SylNet generalizes to languages beyond its training data and further improves with adaptation. It also outperforms several previously proposed methods for syllabification, including end-to-end BLSTMs.Index Terms-syllable count estimation, end-to-end learning, deep learning, speech processing.

show abstract

Prof-Life-Log: Analysis and classification of activities in daily audio streams

Cited by 13 publications

References 12 publications

Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review

Recognition of Activities of Daily Living Based on Environmental Analyses Using Audio Fingerprinting Techniques: A Systematic Review

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

SylNet: An Adaptable End-to-End Syllable Count Estimator for Speech

Contact Info

Product

Resources

About