Abstract:We propose a violin bowing action recognition system that can accurately recognize distinct bowing actions in classical violin performance. This system can recognize bowing actions by analyzing signals from a depth camera and from inertial sensors that are worn by a violinist. The contribution of this study is threefold: (1) a dataset comprising violin bowing actions was constructed from data captured by a depth camera and multiple inertial sensors; (2) data augmentation was achieved for depth-frame data throu… Show more
“…As can be seen from Figure 11, the bone features designed in this study have achieved the best recognition results, reaching about 94% recognition accuracy, which is about 2% higher than the features extracted from literature [28], which fully proves the effectiveness of the bone node vector features proposed in this study.…”
Section: Analysis Of Upper-limb Motion Segmentation and Labanotation ...supporting
confidence: 59%
“…Displacement, velocity, and acceleration can reflect the posture information of human body, and angle can reflect the rotation information of human body. e included angle of knee joint is taken as an example (the included angle of elbow joint is the same), as shown in Figure 3: If the leftUpLeg-node world coordinate system is (x 0 , y 0 , z 0 ), leftLowLeg-node world coordinate system is (x 1 , y 1 , z 1 ), and leftFoot-node world coordinate system is (x 2 , y 2 , z 2 ), the side length of the triangle formed by these three nodes of the human leg and the included angle of the knee joint can be calculated as follows: [22], target detection [23], behavior recognition [24], and natural language processing and other fields [25][26][27]and has made extraordinary achievements [28].…”
Section: Preprocessing Of 3d Motion Capture Datamentioning
confidence: 99%
“…e recognition accuracy of human upper-limb movements is first compared to a feature of normalized human orientation extracted from literature [28] using the same spatial segmentation method, as shown in Figure 11. Fea-ture1 denotes the original action data, Feature2 the features added with normalization of human body orientation, Method in reference [19] Method in reference [20] Method in reference [21] CNN method…”
Section: Analysis Of Upper-limb Motion Segmentation and Labanotation ...mentioning
All human movements can be effectively represented with labanotation, which is simple to read and preserve. However, manually recording the labanotation takes a long time, so figuring out how to use the labanotation to accurately and quickly record and preserve traditional dance movements is a key research question. An automatic labanotation generation algorithm based on DL (deep learning) is proposed in this study. The BVH file is first analyzed, and the data are then converted. On this foundation, a CNN (convolutional neural network) algorithm for generating the dance spectrum of human lower-limb movements is proposed, which is very good at learning action space information. The algorithm performs admirably in terms of classification and recognition. Finally, a spatial segmentation-based automatic labanotation generation algorithm is proposed. To begin, every frame of data is converted into a symbol sequence using spatial law, resulting in a very dense motion sequence. The motion sequence is then regulated according to the minimum beat of motion obtained through wavelet analysis. To arrive at the final result, the classifier is used to determine whether each symbol is reserved or not. As a result, we will be able to create more accurate dance music for simple human movements.
“…As can be seen from Figure 11, the bone features designed in this study have achieved the best recognition results, reaching about 94% recognition accuracy, which is about 2% higher than the features extracted from literature [28], which fully proves the effectiveness of the bone node vector features proposed in this study.…”
Section: Analysis Of Upper-limb Motion Segmentation and Labanotation ...supporting
confidence: 59%
“…Displacement, velocity, and acceleration can reflect the posture information of human body, and angle can reflect the rotation information of human body. e included angle of knee joint is taken as an example (the included angle of elbow joint is the same), as shown in Figure 3: If the leftUpLeg-node world coordinate system is (x 0 , y 0 , z 0 ), leftLowLeg-node world coordinate system is (x 1 , y 1 , z 1 ), and leftFoot-node world coordinate system is (x 2 , y 2 , z 2 ), the side length of the triangle formed by these three nodes of the human leg and the included angle of the knee joint can be calculated as follows: [22], target detection [23], behavior recognition [24], and natural language processing and other fields [25][26][27]and has made extraordinary achievements [28].…”
Section: Preprocessing Of 3d Motion Capture Datamentioning
confidence: 99%
“…e recognition accuracy of human upper-limb movements is first compared to a feature of normalized human orientation extracted from literature [28] using the same spatial segmentation method, as shown in Figure 11. Fea-ture1 denotes the original action data, Feature2 the features added with normalization of human body orientation, Method in reference [19] Method in reference [20] Method in reference [21] CNN method…”
Section: Analysis Of Upper-limb Motion Segmentation and Labanotation ...mentioning
All human movements can be effectively represented with labanotation, which is simple to read and preserve. However, manually recording the labanotation takes a long time, so figuring out how to use the labanotation to accurately and quickly record and preserve traditional dance movements is a key research question. An automatic labanotation generation algorithm based on DL (deep learning) is proposed in this study. The BVH file is first analyzed, and the data are then converted. On this foundation, a CNN (convolutional neural network) algorithm for generating the dance spectrum of human lower-limb movements is proposed, which is very good at learning action space information. The algorithm performs admirably in terms of classification and recognition. Finally, a spatial segmentation-based automatic labanotation generation algorithm is proposed. To begin, every frame of data is converted into a symbol sequence using spatial law, resulting in a very dense motion sequence. The motion sequence is then regulated according to the minimum beat of motion obtained through wavelet analysis. To arrive at the final result, the classifier is used to determine whether each symbol is reserved or not. As a result, we will be able to create more accurate dance music for simple human movements.
“…Dalmazzo and Ramirez [11] presented bowing technique classification by applying Hierarchical Markov Model (HHMM) to data from inertial sensors and audio recordings acquired from a single violinist playing a simple G-Major scale and the accuracy obtained with motion only, audio only and audio + motion features are 93.2%, 39.01% and 94.61%, respectively. Sun et al [12] presented the use of Deep learning models for classifying violin bowing techniques by analyzing the signals from inertial sensors and depth camera and were able to get an average accuracy greater than 80%. Dalmazzo and Ramirez [13] presented Deep Learning techniques for classifying violin bowing techniques such as detaché , legato, martelé , collé , staccato, ricochet, tré molo and col legno and were able to get an accuracy of 97.15%, 98.55%, 99.23% with CNN, 3D-MultiHeaded CNN, CNN LSTM models respectively.…”
Section: B Bowing Technique Classificationmentioning
confidence: 99%
“…The previous studies have presented methods that incorporate either motion data [11][12][13][14] or audio data [10] and dataset considered was from a simple playing of violin such as scale. To the best of our knowledge, there is no work available where both motion and audio features are considered for classifying bowing categories.…”
Section: B Bowing Technique Classificationmentioning
Bowing gesture while playing violin refers to the motion of the violinist's arm. Violinists use different types of bow strokes to express musical phrases, played by the movement of the right arm holding the fiddle bow. Although the sound produced by each bow stroke is distinct, it can be difficult for new fiddlers to distinguish and recognize these bowing techniques. So, this paper presents a novel approach of an ensemble of multimodal deep learning models consisting of one Convolution Neural Network (CNN) and two Long Short-Term Memory (LSTM) models to classify into one of the five bowing classes: detaché , legato, martelé , spiccato and staccato. The dataset used consists of audio samples performed by 8 violinists along with the motion of their forearms measured using a Myo sensor device, to acquire 8-channels of Electromyogram (EMG) data and 13-channels of Inertial Measurement Unit (IMU) data. The audio features are extracted from audio excerpts and time domain features are extracted from EMG and IMU motion signals. These features are passed into an ensemble of deep learning models to make the final prediction using weighted voting. The proposed ensemble classifier was able to deliver optimal results with an overall accuracy of 99.5%, which is better than the previous studies that took only either audio or motion data into consideration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.