2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2015
DOI: 10.1109/icassp.2015.7178866
|View full text |Cite
|
Sign up to set email alerts
|

Prof-Life-Log: Analysis and classification of activities in daily audio streams

Abstract: A new method to analyze and classify daily activities in per sonal audio recordings (PARs) is presented. The method em ploys speech activity detection (SAD) and speaker diarization systems to provide high level semantic segmentation of the audio file. Subsequently. a number of audio. speech and lex ical features are computed in order to characterize events in daily audio streams. The features are selected to capture the statistical properties of conversations, topics and turn-taking behavior, which creates a c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 12 publications
0
6
0
Order By: Relevance
“…The authors of [ 18 ] created a method to analyze and classify daily activities in personal audio recordings (PARs). The method applies: speech activity detection (SAD), speaker diarization systems, and computing the number of audio speech and lexical features [ 18 ]. It uses a TO-Combo-SAD (Threshold Optimized Combo SAD) algorithm for separating speech from noise [ 18 ].…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The authors of [ 18 ] created a method to analyze and classify daily activities in personal audio recordings (PARs). The method applies: speech activity detection (SAD), speaker diarization systems, and computing the number of audio speech and lexical features [ 18 ]. It uses a TO-Combo-SAD (Threshold Optimized Combo SAD) algorithm for separating speech from noise [ 18 ].…”
Section: Resultsmentioning
confidence: 99%
“…The method applies: speech activity detection (SAD), speaker diarization systems, and computing the number of audio speech and lexical features [ 18 ]. It uses a TO-Combo-SAD (Threshold Optimized Combo SAD) algorithm for separating speech from noise [ 18 ]. The Principal Component Analysis (PCA) is first applied for dimensionality reduction, and then, the remaining features are supplied to a multi-class support vector machine (SVM) with radial basis function (RBF) kernel for model training and evaluation [ 18 ].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Automatic word count estimation (WCE) from audio recordings can be used to investigate vocal activity and social interaction as a function of recording time and location, such as in personal life logs derived from wearable sensors (Ziaei et al, 2015;Ziaei et al, 2016). WCE is also a highly useful tool in the scientific study of child language acquisition because it can help answer questions such as how much speech children hear in their daily lives in different contexts (e.g., Bergelson et al, 2018a), and how the language input maps to later developmental outcomes in the same children (Weisleder & Fernald, 2013;Ramírez-Esparza et al, 2014).…”
Section: Introductionmentioning
confidence: 99%
“…For instance, syllable-based speaking rate estimation algorithms (such as [1], [2]) can be used to analyze prosodic patterns of speakers and speaking styles for linguistic research, or used as additional information for training text-to-speech (TTS) synthesis systems. Syllables are also used for automatic estimation of vocal activity and social interaction from long and noisy audio recordings captured by wearable microphones, as in the personal life log application of Ziaei et al [3] [4]. There is also a need for robust language-independent methods for quantifying the amount of speech in daylong child-centered audio recordings from various language environments [5], [6], as child language researchers use such data to understand language development in children in (e.g., [7], [8]).…”
Section: Introductionmentioning
confidence: 99%