On structured sparsity of phonological posteriors for linguistic parsing

Cerňak, Miloš; Asaei, Afsaneh; Bourlard, Herv

doi:10.1016/j.specom.2016.08.004

Cited by 11 publications

(16 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This result was expected due to the binary nature of phonological posteriors [18,21]. Moreover, if the dewarped posteriors are quantized into binary vectors and Jaccard similarity is used for binary pattern matching [21], similar results as the Spearman similarity measure is achieved. This observation again confirms that the space of phonological posteriors is highly structured and the structures bear more information than the exact posterior values.…”

Section: Qbe-std Resultsmentioning

confidence: 68%

“…The permissible combinations are highly constrained due to articulatory mechanisms governing speech production. Therefore, the probabilities constituting a posterior are confined to a small number of components where the indices of high probabilities determine the unique structure of the vocal machinery in speech production [21]. [1,27].…”

Section: Qbe-std Resultsmentioning

confidence: 99%

“…The posterior space is highly structured and lowdimensional [18,21]. To exploit this property, we propose to use Spearman's rank correlation to measure the similarity of posterior exemplars.…”

Section: Structural Similarity Measurementioning

confidence: 99%

“…We use the open-source DNN based phonological vocoding platform [24] for estimation of the extended Sound Pattern of English (eSPE) phonological posteriors. The motivation for using phonological posteriors is three-fold: (1) Phonological posterior quantization and hashing is found to be effective in search space reduction for accurate classification [14,18,21], (2) Sub-phonetic nature of phonological posteriors facilitates development of flexible and lowresource speech detection and recognition solutions [25], and (3) Phonological posterior are found robust for inter-domain posterior representation where the training and testing acoustic conditions and languages are different [14,18,21].…”

Section: Posterior Representationmentioning

confidence: 99%

See 3 more Smart Citations

Phonological Posterior Hashing for Query by Example Spoken Term Detection

Asaei¹,

Ram²,

Bourlard³

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

State of the art query by example spoken term detection (QbE-STD) systems in zero-resource conditions rely on representation of speech in terms of sequences of class-conditional posterior probabilities estimated by deep neural network (DNN). The posteriors are often used for pattern matching or dynamic time warping (DTW). Exploiting posterior probabilities as speech representation propounds diverse advantages in a classification system. One key property of the posterior representations is that they admit a highly effective hashing strategy that enables indexing a large audio archive in divisions for reducing the search complexity. Moreover, posterior indexing leads to a compressed representation and enables pronunciation dewarping and partial detection with no need for DTW. We exploit these characteristics of the posterior space in the context of redundant hash addressing for query-by-example spoken term detection (QbE-STD). We evaluate the QbE-STD system on AMI corpus and demonstrate that tremendous speedup and superior accuracy is achieved compared to the state-of-the-art pattern matching solution based on DTW. The system has the potential to enable massively large scale spoken query detection.

show abstract

Section: Qbe-std Resultsmentioning

confidence: 68%

Section: Qbe-std Resultsmentioning

confidence: 99%

Section: Structural Similarity Measurementioning

confidence: 99%

Section: Posterior Representationmentioning

confidence: 99%

See 2 more Smart Citations

Phonological Posterior Hashing for Query by Example Spoken Term Detection

Asaei¹,

Ram²,

Bourlard³

2018

Interspeech 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…In addition, we exploit phonological structures [9] to enable automatic analysis of duration and trajectory without any need for automatic alignment. Prior work on phonological structures demonstrate their relation to articulatory postures [9], thus considering the structure of multiple consecutive segments enables quantification of the dynamic and trajectory of articulatory movements and co-articulation. The studies presented in this paper exploit this structural property of phonological posteriors to obtain speech-based markers of PAoS severity.…”

Section: Introductionmentioning

confidence: 99%

PAoS Markers: Trajectory Analysis of Selective Phonological Posteriors for Assessment of Progressive Apraxia of Speech

Asaei¹,

Cerňak²,

Laganaro³

2016

7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2016)

Self Cite

View full text Add to dashboard Cite

Progressive apraxia of Speech (PAoS) is a progressive motor speech disorder associated with neurodegenerative disease causing impairment of phonetic encoding and motor speech planning. Clinical observation and acoustic studies show that duration analysis provides reliable cues for diagnosis of the disease progression and severity of articulatory disruption. The goal of this paper is to develop computational methods for objective evaluation of duration and trajectory of speech articulation. We use phonological posteriors as speech features. Phonological posteriors consist of probabilities of phonological classes estimated for every short segment of the speech signal.PAoS encompasses lengthening of duration which is more pronounced in vowels [1, 2]; we thus hypothesize that a small subset of phonological classes provide stronger evidence for duration and trajectory analysis. These classes are determined through analysis of linear prediction coefficients (LPC). To enable trajectory analysis without phonetic alignment, we exploit phonological structures defined through quantization of phonological posteriors. Duration and trajectory analysis are conducted on blocks of multiple consecutive segments possessing similar phonological structures. Moreover, unique phonological structures are identified for every severity condition.

show abstract

Phonetic subspace features for improved query by example spoken term detection

Ram

Asaei

Bourlard

2018

Speech Communication

View full text Add to dashboard Cite

I would like to express my very great appreciation to Dr Sébastien Marcel for his valuable and constructive suggestions during the development of this project. I am grateful for the assistance given by Mr. Flavio Tarsetti. I would also like to express my special thanks to my colleagues of the Biometrics Security and Privacy group as i came to know about so many new things. Finally I would also like to thank my parents and friends who helped me a lot with their continuous support throughout the duration of this project.

show abstract

On structured sparsity of phonological posteriors for linguistic parsing

Cited by 11 publications

References 43 publications

Phonological Posterior Hashing for Query by Example Spoken Term Detection

Phonological Posterior Hashing for Query by Example Spoken Term Detection

PAoS Markers: Trajectory Analysis of Selective Phonological Posteriors for Assessment of Progressive Apraxia of Speech

Phonetic subspace features for improved query by example spoken term detection

Contact Info

Product

Resources

About