Shigeki Sagayama scite author profile

Abstract-This paper proposes a multipitch analyzer called the harmonic temporal structured clustering (HTC) method, that jointly estimates pitch, intensity, onset, duration, etc., of each underlying source in a multipitch audio signal. HTC decomposes the energy patterns diffused in time-frequency space, i.e., the power spectrum time series, into distinct clusters such that each has originated from a single source. The problem is equivalent to approximating the observed power spectrum time series by superimposed HTC source models, whose parameters are associated with the acoustic features that we wish to extract. The update equations of the HTC are explicitly derived by formulating the HTC source model with a Gaussian kernel representation. We verified through experiments the potential of the HTC method.Index Terms-Computational acoustic scene analysis, harmonic temporal structured clustering (HTC), multipitch analyzer.

show abstract

Complex NMF: A new sparse representation for acoustic signals

Kameoka¹,

Ono

Kashino³

et al. 2009

128

135

View full text Add to dashboard Cite

This paper presents a new sparse representation for acoustic signals which is based on a mixing model defined in the complex-spectrum domain (where additivity holds), and allows us to extract recurrent patterns of magnitude spectra that underlie observed complex spectra and the phase estimates of constituent signals. An efficient iterative algorithm is derived, which reduces to the multiplicative update algorithm for non-negative matrix factorization developed by Lee under a particular condition.

show abstract

Sparseness-Based 2CH BSS using the EM Algorithm in Reverberant Environment

Izumi

Ono

Sagayama

2007

View full text Add to dashboard Cite

In this paper, we propose a new approach to sparseness-based BSS based on the EM algorithm, which iteratively estimates the DOA and the time-frequency mask for each source through the EM algorithm under the sparseness assumption. Our method has the following characteristics: 1) it enables the introduction of physical observation models such as the diffuse sound field, because the likelihood is defined in the original signal domain and not in the feature domain, 2) one does not necessarily have to know in advance the power of the background noise since they are also parameters which can be estimated from the observed signal, 3) it takes short computational time, 4) a common objective function is iteratively increased in localization and separation steps, which correspond to the E-step and M-step, respectively. Although our framework is applicable to general N channel BSS, we will concentrate on the formulation of the problem in the particular case where two sensory inputs are available, and we show some numerical simulation results.

show abstract

Quantification of speech and synchrony in the conversation of adults with autism spectrum disorder

et al. 2019

View full text Add to dashboard Cite

Autism spectrum disorder (ASD) is a highly prevalent neurodevelopmental disorder characterized by impairments in social reciprocity and communication together with restricted interest and stereotyped behaviors. The Autism Diagnostic Observation Schedule (ADOS) is considered a ‘gold standard’ instrument for diagnosis of ASD and mainly depends on subjective assessments made by trained clinicians. To develop a quantitative and objective surrogate marker for ASD symptoms, we investigated speech features including F0, speech rate, speaking time, and turn-taking gaps, extracted from footage recorded during a semi-structured socially interactive situation from ADOS. We calculated not only the statistic values in a whole session of the ADOS activity but also conducted a block analysis, computing the statistical values of the prosodic features in each 8s sliding window. The block analysis identified whether participants changed volume or pitch according to the flow of the conversation. We also measured the synchrony between the participant and the ADOS administrator. Participants with high-functioning ASD showed significantly longer turn-taking gaps and a greater proportion of pause time, less variability and less synchronous changes in blockwise mean of intensity compared with those with typical development (TD) (p<0.05 corrected). In addition, the ASD group had significantly wider distribution than the TD group in the within-participant variability of blockwise mean of log F0 (p<0.05 corrected). The clinical diagnosis could be discriminated using the speech features with 89% accuracy. The features of turn-taking and pausing were significantly correlated with deficits of ASD in reciprocity (p<0.05 corrected). Additionally, regression analysis provided 1.35 of mean absolute error in the prediction of deficits in reciprocity, to which the synchrony of intensity especially contributed. The findings suggest that considering variance of speech features, interaction and synchrony with conversation partner are critical to characterize atypical features in the conversation of people with ASD.

show abstract

Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices

Nakamura

Yoshii

Sagayama

2017

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music. This model solves a major problem of conventional methods that could not properly describe the nature of multiple voices as in polyrhythmic scores or in the phenomenon of loose synchrony between voices. In this paper we present a complete description of the proposed model and develop an inference technique, which is valid for any merged-output HMMs for which output probabilities depend on past events. We also examine the influence of the architecture and parameters of the method in terms of accuracies of rhythm transcription and voice separation and perform comparative evaluations with six other algorithms. Using MIDI recordings of classical piano pieces, we found that the proposed model outperformed other methods by more than 12 points in the accuracy for polyrhythmic performances and performed almost as good as the best one for non-polyrhythmic performances. This reveals the state-ofthe-art methods of rhythm transcription for the first time in the literature. Publicly available source codes are also provided for future comparisons.Index Terms-Rhythm transcription, statistical music language model, model for polyphonic music scores, hidden Markov models, music performance model.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.