2021
DOI: 10.5281/zenodo.4792298
|View full text |Cite
|
Sign up to set email alerts
|

librosa/librosa: 0.8.1rc2

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0
2

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(14 citation statements)
references
References 0 publications
0
12
0
2
Order By: Relevance
“…The lossy communication channel has Gaussian white noise with a signal-to-noise ratio (SNR) of 30 dB, unless otherwise stated. During training, the channel applies Gaussian-sampled time stretch and pitch shift using Librosa (McFee et al, 2021), with variance 0.4 and 0.3, respectively. The channel also masks up to 15% of the mel-spectrogram time-axis during training.…”
Section: Methodsmentioning
confidence: 99%
“…The lossy communication channel has Gaussian white noise with a signal-to-noise ratio (SNR) of 30 dB, unless otherwise stated. During training, the channel applies Gaussian-sampled time stretch and pitch shift using Librosa (McFee et al, 2021), with variance 0.4 and 0.3, respectively. The channel also masks up to 15% of the mel-spectrogram time-axis during training.…”
Section: Methodsmentioning
confidence: 99%
“…In doing so, we reduce the size of the data and training time without loss of essential information as the characteristic sounds of a cough are associated with frequencies below 10 kHz [26]. The anti-aliasing filter is a precomputed low-pass filter provided by librosa [27], a python package for music and audio analysis. The filter is designed using a Kaiser window with β = 8.56 and a roll-off frequency of 0.85 * f nyquist , where f nyquist = 11.025kHz is the Nyquist frequency, i.e.…”
Section: B Data Pre-processingmentioning
confidence: 99%
“…Furthermore, we established a human baseline with which we compared our model's performance. We instructed each of our 19 human raters (13 male, 6 female, age [22][23][24][25][26][27][28][29][30][31] to perform the same verification tests as our model. More precisely, for each of the ten coughers in the test set, the rater was first allowed to listen to the same enrollment samples presented to our model as often as desired.…”
Section: Idmentioning
confidence: 99%
“…Following our former work in [7], we process music as barwise spectrograms, with a fixed number of frames per bar. Practically, spectrograms are computed using librosa [16] with a low hop length of 32 frames at a sampling rate of 44.1kHz, and downbeats are estimated with the madmom toolbox [14]. This allows us to split the original spectrogram in b barwise spectrograms (b being the number of bars in this song) each containing n b frames.…”
Section: Barwise Music Processingmentioning
confidence: 99%