Handwritten Music Object Detection: Open Issues and Baseline Results

Pacha, Alexander; Choi, Kwon-Young; Coüasnon, Bertrand; Ricquebourg, Yann; Zanibbi, Richard; Eidenberger, Horst

doi:10.1109/das.2018.51

Cited by 37 publications

(30 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Based on our observations, the main reason of subpar performance of the SSD and R-FCN detectors were the underprediction (by the SSD detector, Figure 7a) and overprediction (by the R-FCN detector, Figure 7b) of egg numbers in the images. Huang, et al [11], Zhang, et al [26], and Pacha, et al [27] also reported that the SSD and R-FCN performed less accurately than faster R-CNN in terms of detecting other types of objects (e.g., handwritten music symbol, wild animal, etc.). Because the SSD detector (with Mobilenet V1 feature extractor) had fewer layers than the other two CNN detectors, it could not obtain as many floor egg features as its counterparts [28].…”

Section: Performance Of the Three Cnn Floor-egg Detectorsmentioning

confidence: 99%

Evaluating Convolutional Neural Networks for Cage-Free Floor Egg Detection

Zhao

et al. 2020

Sensors

View full text Add to dashboard Cite

The manual collection of eggs laid on the floor (or ‘floor eggs’) in cage-free (CF) laying hen housing is strenuous and time-consuming. Using robots for automatic floor egg collection offers a novel solution to reduce labor yet relies on robust egg detection systems. This study sought to develop vision-based floor-egg detectors using three Convolutional Neural Networks (CNNs), i.e., single shot detector (SSD), faster region-based CNN (faster R-CNN), and region-based fully convolutional network (R-FCN), and evaluate their performance on floor egg detection under simulated CF environments. The results show that the SSD detector had the highest precision (99.9 ± 0.1%) and fastest processing speed (125.1 ± 2.7 ms·image−1) but the lowest recall (72.1 ± 7.2%) and accuracy (72.0 ± 7.2%) among the three floor-egg detectors. The R-FCN detector had the slowest processing speed (243.2 ± 1.0 ms·image−1) and the lowest precision (93.3 ± 2.4%). The faster R-CNN detector had the best performance in floor egg detection with the highest recall (98.4 ± 0.4%) and accuracy (98.1 ± 0.3%), and a medium prevision (99.7 ± 0.2%) and image processing speed (201.5 ± 2.3 ms·image−1); thus, the faster R-CNN detector was selected as the optimal model. The faster R-CNN detector performed almost perfectly for floor egg detection under a wide range of simulated CF environments and system settings, except for brown egg detection at 1 lux light intensity. When tested under random settings, the faster R-CNN detector had 91.9–94.7% precision, 99.8–100.0% recall, and 91.9–94.5% accuracy for floor egg detection. It is concluded that a properly-trained CNN floor-egg detector may accurately detect floor eggs under CF housing environments and has the potential to serve as a crucial vision-based component for robotic floor egg collection systems.

show abstract

Section: Performance Of the Three Cnn Floor-egg Detectorsmentioning

confidence: 99%

Evaluating Convolutional Neural Networks for Cage-Free Floor Egg Detection

Zhao

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…cropping each staff). A similar approach is [5], where Pacha et al propose an endto-end trainable object detector for music primitives. The proposed method uses a machine-learning approach considering region-based deep convolutional neural networks.…”

Section: Approaches For Handwritten Scoresmentioning

confidence: 99%

“…Although the interest in OMR has reawakened with the appearance of deep learning, as far as we know, the few existing methods that attempt to recognize handwritten scores are mostly focused on solving a particular stage of OMR, such as layout analysis [3] or detection and classification of graphic primitives [4] or music symbols [5,6]. However, in the particular case of Western classical music, music scores are complex documents composed of staves (five horizontal lines), music symbols (e.g.…”

Section: Introductionmentioning

confidence: 99%

From Optical Music Recognition to Handwritten Music Recognition: A baseline

Baró

Riba

Calvo-Zaragoza

et al. 2019

Pattern Recognition Letters

View full text Add to dashboard Cite

Optical Music Recognition (OMR) is the branch of document image analysis that aims to convert images of musical scores into a computer-readable format. Despite decades of research, the recognition of handwritten music scores, concretely the Western notation, is still an open problem, and the few existing works only focus on a specific stage of OMR. In this work, we propose a full Handwritten Music Recognition (HMR) system based on Convolutional Recurrent Neural Networks, data augmentation and transfer learning, that can serve as a baseline for the research community.

show abstract

“…More recently, convolutional-based neural network detectors [6] that merge the segmentation and classification steps have been applied to a variety of dataset like the newly annotated handwritten dataset of modern music, the MUS-CIMA++ dataset [7] or on mensural music scores by [8]. Fully convolutional neural networks have also been used by [9] and [10] which allows for pixel wise segmentation of music symbols.…”

Section: A Optical Music Recognitionmentioning

confidence: 99%

CNN-Based Accidental Detection in Dense Printed Piano Scores

Choi

Coüasnon²,

Ricquebourg

et al. 2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

Self Cite

View full text Add to dashboard Cite

The recognition of mid-18th to mid-20th century piano scores presents segmentation challenges caused by touching and broken symbols produced by imprinting techniques and time degradation. We present a new notehead accidental dataset containing 2955 images from dense and damaged piano scores. We address this detection problem with very small training samples using a simple Spatial Transformer (ST)-based Convolutional Neural Network detector improved through bootstrapping and contextual information, and more powerful deep learning detectors (Faster R-CNN, R-FCN, and SSD) with transfer-learning on the COCO dataset. We trained all our detectors using 5 fold cross-validation and obtain 98.73% mean Average Precision (mAP) for an Intersection over Union (IoU) threshold of 0.75 with our best detector. Our STbased detector obtains a slightly lower mAP of 94.81%, but runs 40 times faster, and uses 18 times less memory.

show abstract

Handwritten Music Object Detection: Open Issues and Baseline Results

Cited by 37 publications

References 19 publications

Evaluating Convolutional Neural Networks for Cage-Free Floor Egg Detection

Evaluating Convolutional Neural Networks for Cage-Free Floor Egg Detection

From Optical Music Recognition to Handwritten Music Recognition: A baseline

CNN-Based Accidental Detection in Dense Printed Piano Scores

Contact Info

Product

Resources

About