Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

Boulanger-Lewandowski, Nicolas; Bengio, Yoshua; Vincent, P.

doi:10.48550/arxiv.1206.6392

Cited by 52 publications

(78 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we evaluate the performance of our approach and compare it to state-of-the-art unitary recurrent models such as uRNN [2], euRNN [4], fcuRNN [3], expRNN [10], nnRNN [12], and RNN [52]. We focus on three learning tasks that are commonly used for benchmarking, the copy task [53], the polyphonic music task on the JSB and MuseData datasets [54,55], the TIMIT speech prediction problem [56], and the character level prediction task on the PTB dataset [57]. We chose to focus on these tasks as they require from the modeling architecture long-term memory capabilities and relatively high expressivity.…”

Section: Methodsmentioning

confidence: 99%

A Differential Geometry Perspective on Orthogonal Recurrent Models

Azencot,

Erichson,

Ben-Chen

et al. 2021

Preprint

View full text Add to dashboard Cite

Recently, orthogonal recurrent neural networks (RNNs) have emerged as state-of-the-art models for learning long-term dependencies. This class of models mitigates the exploding and vanishing gradients problem by design. In this work, we employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs. We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields. Specifically, based on a well-known result in differential geometry that relates vector fields and linear operators, we prove that every divergence-free vector field is related to a skew-symmetric matrix. Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields. Our method parameterizes vector fields via the directional derivatives of scalar functions. This requires the construction of latent inner product, gradient, and divergence operators. In comparison to state-of-the-art orthogonal RNNs, our approach achieves comparable or better results on a variety of benchmark tasks.

show abstract

Section: Methodsmentioning

confidence: 99%

A Differential Geometry Perspective on Orthogonal Recurrent Models

Azencot,

Erichson,

Ben-Chen

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…However, our proposed R-Transformer that leverages LocalRNN to incorporate local information, has achieved better performance than TCN. (Bai et al, 2018) -1.37 LSTM (Bai et al, 2018) 2 /600 1.36 TCN (Bai et al, 2018) 3 Next, we evaluate R-Transformer on the task of polyphonic music modeling with Nottingham dataset (Boulanger-Lewandowski et al, 2012). This dataset collects British and American folk tunes and has been commonly used in previous works to investigate the model's ability for polyphonic music modeling (Boulanger-Lewandowski et al, 2012;Chung et al, 2014;Bai et al, 2018).…”

Section: Pixel-by-pixel Mnist: Sequence Classificationmentioning

confidence: 99%

R-Transformer: Recurrent Neural Network Enhanced Transformer

Wang,

Ma,

Liu

et al. 2019

Preprint

View full text Add to dashboard Cite

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation procedure. Therefore, many non-recurrent sequence models that are built on convolution and attention operations have been proposed recently. Notably, models with multi-head attention such as Transformer have demonstrated extreme effectiveness in capturing long-term dependencies in a variety of sequence modeling tasks. Despite their success, however, these models lack necessary components to model local structures in sequences and heavily rely on position embeddings that have limited effects and require a considerable amount of design efforts. In this paper, we propose the R-Transformer which enjoys the advantages of both RNNs and the multi-head attention mechanism while avoids their respective drawbacks. The proposed model can effectively capture both local structures and global longterm dependencies in sequences without any use of position embeddings. We evaluate R-Transformer through extensive experiments with data from a wide range of domains and the empirical results show that R-Transformer outperforms the stateof-the-art methods by a large margin in most of the tasks. We have made the code publicly available at https://github.com/DSE-MSU/R-transformer.

show abstract

“…More generally, in both music and speech, various combinations of recurrent convolutional neural networks have been successfully adopted in audio signal processing and Music Information Retrieval (MIR) applications. [15] applies RNNs coupled with restricted Bolzmann machines to polyphonic pitch transcription. In [16], a Convolutional Gated Recurrent Unit (CGRU), where the GRU [17] structure, an adaptation of RNNs that addresses the gradient vanishing problem, estimates the main melody in polyphonic audio signals pre-processed using the Constant-Q Transform (CQT) followed by Nonnegative Matrix Factorization [18].…”

Section: Deep Learning Research In Music Signal Processingmentioning

confidence: 99%

Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

Wager,

Tzanetakis,

Wang

et al. 2019

Preprint

View full text Add to dashboard Cite

We describe a machine-learning approach to pitch correcting a solo singing performance in a karaoke setting, where the solo voice and accompaniment are on separate tracks. The proposed approach addresses the situation where no musical score of the vocals nor the accompaniment exists: It predicts the amount of correction from the relationship between the spectral contents of the vocal and accompaniment tracks. Hence, the pitch shift in cents suggested by the model can be used to make the voice sound in tune with the accompaniment. This approach differs from commercially used automatic pitch correction systems, where notes in the vocal tracks are shifted to be centered around notes in a user-defined score or mapped to the closest pitch among the twelve equal-tempered scale degrees. We train the model using a dataset of 4,702 amateur karaoke performances selected for good intonation. We present a Convolutional Gated Recurrent Unit (CGRU) model to accomplish this task. This method can be extended into unsupervised pitch correction of a vocal performance-popularly referred to as autotuning.

show abstract

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

Cited by 52 publications

References 0 publications

A Differential Geometry Perspective on Orthogonal Recurrent Models

A Differential Geometry Perspective on Orthogonal Recurrent Models

R-Transformer: Recurrent Neural Network Enhanced Transformer

Deep Autotuner: A Data-Driven Approach to Natural-Sounding Pitch Correction for Singing Voice in Karaoke Performances

Contact Info

Product

Resources

About