Nikolay Mikhaylovskiy scite author profile

Nikolay Mikhaylovskiy

5Publications

14Citation Statements Received

75Citation Statements Given

How they've been cited

How they cite others

108

Affiliations

National Research Tomsk State University

Publications

Order By: Most citations

Learning Efficient Representations for Keyword Spotting with Triplet Loss

Vygon

Mikhaylovskiy

2021

View full text Add to dashboard Cite

In the past few years, triplet loss-based metric embeddings have become a de-facto standard for several important computer vision problems, most notably, person reidentification. On the other hand, in the area of speech recognition the metric embeddings generated by the triplet loss are rarely used even for classification problems. We fill this gap showing that a combination of two representation learning techniques: a triplet loss-based embedding and a variant of kNN for classification instead of cross-entropy loss significantly (by 26% to 38%) improves the classification accuracy for convolutional networks on a LibriSpeech-derived LibriWords datasets. To do so, we propose a novel phonetic similarity based triplet mining approach. We also match the current best published SOTA for Google Speech Commands dataset V2 10+2-class classification with an architecture that is about 6 times more compact and improve the current best published SOTA for 35class classification on Google Speech Commands dataset V2 by over 40%. 1

show abstract

Video monitoring over anti-decubitus protocol execution with a deep neural network to prevent pressure ulcer

Danilovich

Moshkin²,

Reimche³

et al. 2021

View full text Add to dashboard Cite

Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions

Bedyakin¹,

Htsts²,

Mikhaylovskiy³

2021

View full text Add to dashboard Cite

This memo describes NTR/TSU winning submission for Low Resource ASR challenge at Dialog2021 conference, language identification track.Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. Traditionally, the ASR task requires large volumes of labeled data that are unattainable for most of the world's languages, including most of the languages of Russia. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results in low-resource setting for the language identification task and set up a SOTA for the Low Resource ASR challenge dataset.Additionally, we compare the structure of confusion matrices for this and significantly more diverse VoxForge dataset and state and substantiate the hypothesis that whenever the dataset is diverse enough so that the other classification factors, like gender, age etc. are well-averaged, the confusion matrix for LID system bears the language similarity measure.

show abstract

How Do You Test the Strength of AI?

Mikhaylovskiy

2020

View full text Add to dashboard Cite

Language ID Prediction from Speech Using Self-Attentive Pooling

Bedyakin¹,

Mikhaylovskiy²

2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.