Shoichiro Saito scite author profile

This paper proposes a novel optimization principle and its implementation for unsupervised anomaly detection in sound (ADS) using an autoencoder (AE). The goal of unsupervised-ADS is to detect unknown anomalous sound without training data of anomalous sound. Use of an AE as a normal model is a state-of-the-art technique for unsupervised-ADS. To decrease the false positive rate (FPR), the AE is trained to minimize the reconstruction error of normal sounds and the anomaly score is calculated as the reconstruction error of the observed sound. Unfortunately, since this training procedure does not take into account the anomaly score for anomalous sounds, the true positive rate (TPR) does not necessarily increase. In this study, we define an objective function based on the Neyman-Pearson lemma by considering ADS as a statistical hypothesis test. The proposed objective function trains the AE to maximize the TPR under an arbitrary low FPR condition. To calculate the TPR in the objective function, we consider that the set of anomalous sounds is the complementary set of normal sounds and simulate anomalous sounds by using a rejection sampling algorithm. Through experiments using synthetic data, we found that the proposed method improved the performance measures of ADS under low FPR conditions. In addition, we confirmed that the proposed method could detect anomalous sounds in real environments.Index Terms-Anomaly detection in sound, Neyman-Pearson lemma, deep learning, and autoencoder.All authors are with the ). A preliminary version of this work is published in [8].

show abstract

Transmission line description of optical feedback and injection locking for Fabry-Perot and DFB lasers

Tromborg¹,

Olesen²,

Xing

et al. 1987

IEEE J. Quantum Electron.

148

View full text Add to dashboard Cite

Experimental observation of complete chaos synchronization in semiconductor lasers

Liu¹,

Takiguchi²,

Davis³

et al. 2002

View full text Add to dashboard Cite

We experimentally demonstrate the complete synchronization of a semiconductor laser to the injection of a chaotic oscillating optical signal that is generated by a similar semiconductor laser with external optical feedback. The synchronization is characterized by sensitive dependencies on frequency detuning and injection strength and a time lag that varies reversely with the variation of the delay time in the external optical feedback of the master laser.

show abstract

ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

et al. 2019

View full text Add to dashboard Cite

This paper introduces a new dataset called "ToyADMOS" designed for anomaly detection in machine operating sounds (ADMOS). To the best our knowledge, no large-scale datasets are available for ADMOS, although large-scale datasets have contributed to recent advancements in acoustic signal processing. This is because anomalous sound data are difficult to collect. To build a large-scale dataset for ADMOS, we collected anomalous operating sounds of miniature machines (toys) by deliberately damaging them. The released dataset consists of three sub-datasets for machine-condition inspection, fault diagnosis of machines with geometrically fixed tasks, and fault diagnosis of machines with moving tasks. Each sub-dataset includes over 180 hours of normal machine-operating sounds and over 4,000 samples of anomalous sounds collected with four microphones at a 48-kHz sampling rate. The dataset is freely available for download at https://github.com/YumaKoizumi/ ToyADMOS-dataset.Index Terms-Anomaly detection in sounds, machine operating sounds, product inspection, dataset.

show abstract

A Transformer-Based Audio Captioning Model with Keyword Estimation

Masumura

Nishida

Yasuda

et al. 2020

View full text Add to dashboard Cite

One of the problems with automated audio captioning (AAC) is the indeterminacy in word selection corresponding to the audio event/scene. Since one acoustic event/scene can be described with several words, it results in a combinatorial explosion of possible captions and difficulty in training. To solve this problem, we propose a Transformer-based audio-captioning model with keyword estimation called TRACKE. It simultaneously solves the word-selection indeterminacy problem with the main task of AAC while executing the sub-task of acoustic event detection/acoustic scene classification (i.e., keyword estimation). TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy. Experimental results on a public AAC dataset indicate that TRACKE achieved state-ofthe-art performance and successfully estimated both the caption and its keywords.

show abstract

Specmurt Analysis of Polyphonic Music Signals

Saito

Kameoka

Takahashi

et al. 2008

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-This paper introduces a new music signal processing method to extract multiple fundamental frequencies, which we call specmurt analysis. In contrast with cepstrum which is the inverse Fourier transform of log-scaled power spectrum with linear frequency, specmurt is defined as the inverse Fourier transform of linear power spectrum with log-scaled frequency. Assuming that all tones in a polyphonic sound have a common harmonic pattern, the sound spectrum can be regarded as a sum of linearly stretched common harmonic structures along frequency. In the log-frequency domain, it is formulated as the convolution of a common harmonic structure and the distribution density of the fundamental frequencies of multiple tones. The fundamental frequency distribution can be found by deconvolving the observed spectrum with the assumed common harmonic structure, where the common harmonic structure is given heuristically or quasi-optimized with an iterative algorithm. The efficiency of specmurt analysis is experimentally demonstrated through generation of a piano-roll-like display from a polyphonic music signal and automatic sound-to-MIDI conversion. Multipitch estimation accuracy is evaluated over several polyphonic music signals and compared with manually annotated MIDI data.Index Terms-Inverse filtering, iteration algorithm, multipitch analysis, pitch visualization, polyphonic music signals.

show abstract

Optimizing acoustic feature extractor for anomalous sound detection based on Neyman-Pearson lemma

Koizumi

Saito

Uematsu

et al. 2017

View full text Add to dashboard Cite

Abstract-We propose a method for optimizing an acoustic feature extractor for anomalous sound detection (ASD). Most ASD systems adopt outlier-detection techniques because it is difficult to collect a massive amount of anomalous sound data. To improve the performance of such outlier-detection-based ASD, it is essential to extract a set of efficient acoustic features that is suitable for identifying anomalous sounds. However, the ideal property of a set of acoustic features that maximizes ASD performance has not been clarified. By considering outlierdetection-based ASD as a statistical hypothesis test, we defined optimality as an objective function that adopts Neyman-Pearson lemma; the acoustic feature extractor is optimized to extract a set of acoustic features which maximize the true positive rate under an arbitrary false positive rate. The variational auto-encoder is applied as an acoustic feature extractor and optimized to maximize the objective function. We confirmed that the proposed method improved the F-measure score from 0.02 to 0.06 points compared to those of conventional methods, and ASD results of a stereolithography 3D-printer in a real-environment show that the proposed method is effective in identifying anomalous sounds.

show abstract

Ultrahigh Thermoresistant Lightweight Bioplastics Developed from Fermentation Products of Cellulosic Feedstock

Nag

Ali

Kawaguchi

et al. 2020

Advanced Sustainable Systems

View full text Add to dashboard Cite

Production of bioplastics from renewable biological resources is a prerequisite for the development of a circular and sustainable society. Current bioplastics are mostly heat‐sensitive aliphatic polymers, requiring thermoresistant aromatic bioplastics. Herein, 3‐amino‐4‐hydroxybenzoic acid (AHBA) and 4‐aminobenzoic acid (ABA) are produced from kraft pulp, an inedible cellulosic feedstock, using metabolically engineered bacteria. AHBA is chemically converted to 3,4‐diaminobenzoic acid (DABA); subsequently, poly(2,5‐benzimidazole) is obtained by the polycondensation of DABA and processed into an ultrahigh thermoresistant film. The copolymerization of DABA with a small amount of ABA dramatically increases the degradation temperatures of the resulting films (over 740 °C) to yield the most thermoresistant plastic on record. Density functional theory calculations indicate that the incorporation of ABA strengthens the interchain hydrogen bonds between aromatic imidazole rings. Thus, an alternative organic molecular design is proposed for thermoresistant plastics without using heavy inorganics, although continuous aromatic heterocycles are widely considered ideal for polymer thermoresistance. This innovative macromolecular design increases thermoresistance and can be widely applied to well‐processable plastics for the production of lightweight materials and is expected to contribute to the development of a more sustainable society.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shoichiro Saito

Unsupervised Detection of Anomalous Sound Based on Deep Learning and the Neyman–Pearson Lemma

Transmission line description of optical feedback and injection locking for Fabry-Perot and DFB lasers

Experimental observation of complete chaos synchronization in semiconductor lasers

ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

A Transformer-Based Audio Captioning Model with Keyword Estimation

Specmurt Analysis of Polyphonic Music Signals

Optimizing acoustic feature extractor for anomalous sound detection based on Neyman-Pearson lemma

Ultrahigh Thermoresistant Lightweight Bioplastics Developed from Fermentation Products of Cellulosic Feedstock

Contact Info

Product

Resources

About