Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

Kriman, Samuel; Beliaev, Stanislav; Ginsburg, Boris; Huang, Jocelyn; Kuchaiev, Oleksii; Lavrukhin, Vitaly; Leary, R. Bret; Li, Jason; Zhang, Yang

doi:10.1109/icassp40776.2020.9053889

Cited by 208 publications

(120 citation statements)

References 31 publications

Supporting

Mentioning

118

Contrasting

Unclassified

Order By: Relevance

“…We also experimented with other different architectures, such as several LSTM-based models, a combination of 1D-CNN and LSTM, a down-scaled version of the basecaller Bonito [20, 21], and variational window sizes to capture different sequencing speeds. However, the LSTM-based models take a long time to train, and the accuracies don’t improve significantly.…”

Section: Methodsmentioning

confidence: 99%

“…This could indicate that local features extracted convolution windows provide sufficient information for classification, and long-range dependencies extracted by the recurrent network only help by a small amount. 21], stacks of LSTM layers with variational window size, different hyperparameter tuning, and different training datasets. After full consideration of model size, speed, performance, and training time, we reported the best performing model architecture in the main paper.…”

Section: Model Architecture Experiments and Hyperparameter Tuningmentioning

confidence: 99%

See 1 more Smart Citation

Real-Time, Direct Classification of Nanopore Signals with SquiggleNet

Bao

Wadden

Erb-Downward

et al. 2021

Preprint

View full text Add to dashboard Cite

Single-molecule sequencers made by Oxford Nanopore provide results in real time as DNA passes through a nanopore and can eject a molecule after it has been partly sequenced. However, the computational challenge of deciding whether to keep or reject a molecule in real time has limited the application of this capability. We present SquiggleNet, the first deep learning model that can classify nanopore reads directly from their electrical signals. SquiggleNet operates faster than the DNA passes through the pore, allowing real-time classification and read ejection. When given the amount of sequencing data generated in one second, the classifier achieves significantly higher accuracy than base calling followed by sequence alignment. Our approach is also faster and requires an order of magnitude less memory than approaches based on alignment. SquiggleNet distinguished human from bacterial DNA with over 90% accuracy across test datasets from different flowcells and sample preparations, generalized to unseen species, and identified bacterial species in a human respiratory meta genome sample.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Model Architecture Experiments and Hyperparameter Tuningmentioning

confidence: 99%

Real-Time, Direct Classification of Nanopore Signals with SquiggleNet

Bao

Wadden

Erb-Downward

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It was first proposed in [40] and is widely used for 2D image analysis [41]- [44]. Recently, depthwise separable convolution has also been incorporated for processing speech signals [45], [46].…”

Section: A Depthwise Separable Convolutions For 1d Signalmentioning

confidence: 99%

DeepArrNet: An Efficient Deep CNN Architecture for Automatic Arrhythmia Detection and Classification From Denoised ECG Beats

2020

View full text Add to dashboard Cite

In this paper, an efficient deep convolutional neural network (CNN) architecture is proposed based on depthwise temporal convolution along with a robust end-to-end scheme to automatically detect and classify arrhythmia from denoised electrocardiogram (ECG) signal, which is termed as 'DeepArrNet'. Firstly, considering the variational pattern of wavelet denoised ECG data, a realistic augmentation scheme is designed that offers a reduction in class imbalance as well as increased data variations. A structural unit, namely PTP (Pontwise-Temporal-Pointwise Convolution) unit, is designed with its variants where depthwise temporal convolutions with varying kernel sizes are incorporated along with prior and post pointwise convolution. Afterward, a deep neural network architecture is constructed based on the proposed structural unit where series of such structural units are stacked together while increasing the kernel sizes for depthwise temporal convolutions in successive units along with the residual linkage between units through feature addition. Moreover, multiple depthwise temporal convolutions are introduced with varying kernel sizes in each structural unit to make the process more efficient while strided convolutions are utilized in the residual linkage between subsequent units to compensate the increased computational complexity. This architecture provides the opportunity to explore the temporal features in between convolutional layers more optimally from different perspectives utilizing diversified temporal kernels. Extensive experimentations are carried out on two publicly available datasets to validate the proposed scheme that results in outstanding performances in all traditional evaluation metrics outperforming other state-of-the-art approaches.

show abstract

“…CTC was developed for speech recognition and first applied to nanopore sequencing by Chiron [46], and was later adopted by various ONT basecallers. Bonito (https://github.com/nanoporetech/bonito) is ONT's most recent research basecaller: it uses a convolutional architecture based on QuartzNet [26], and is trained with CTC loss. In practice, Bonito uses Viterbi decoding, which simply takes the argmax of the logits and concatenates the resulting nucleotide and gap characters.…”

Section: Decoding the Most Likely Output Sequence Of A Neural Networkmentioning

confidence: 99%

Machine Boss: Rapid Prototyping of Bioinformatic Automata

Silvestre-Ryan¹,

Wang²,

Sharma³

et al. 2020

Preprint

View full text Add to dashboard Cite

Motivation. Many C++ libraries for using Hidden Markov Models in bioinformatics focus on inference tasks, such as likelihood calculation, parameter-fitting, and alignment. However, construction of the state machines can be a laborious task, automation of which would be time-saving and less error-prone. Results. We present Machine Boss, a software tool implementing not just inference and parameterfitting algorithms, but also a set of operations for manipulating and combining automata. The aim is to make prototyping of bioinformatics HMMs as quick and easy as the construction of regular expressions, with one-line "recipes" for many common applications. We report data from several illustrative examples involving protein-to-DNA alignment, DNA data storage, and nanopore sequence analysis. Availability and Implementation. Machine Boss is released under the BSD-3 open source license and is available from http://machineboss.org/.

show abstract

Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions

Cited by 208 publications

References 31 publications

Real-Time, Direct Classification of Nanopore Signals with SquiggleNet

Real-Time, Direct Classification of Nanopore Signals with SquiggleNet

DeepArrNet: An Efficient Deep CNN Architecture for Automatic Arrhythmia Detection and Classification From Denoised ECG Beats

Machine Boss: Rapid Prototyping of Bioinformatic Automata

Contact Info

Product

Resources

About