Audio-Visual Tensor Fusion Network for Piano Player Posture Classification

Park, So‐Hyun

doi:10.3390/app10196857

Cited by 6 publications

(12 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The authors proposed methods that apply various operations such as a Kronecker product and a Hadamard product, respectively, to implicitly represent spatiotemporal information and in the end fuse each feature value obtained through the operations. Finally, the previous methods have a problem [11][12][13][14][15][16] in that they do not provide evidence or sufficient explanation for the results derived by the model. This problem is resolved through a method of representing the RGB image, which preserves the pre-fusion data and the Conv-layer filters as analyzable feature maps (Figure 1e).…”

Section: Spaito-temporal Data Representationmentioning

confidence: 99%

“…This problem is resolved through a method of representing the RGB image, which preserves the pre-fusion data and the Conv-layer filters as analyzable feature maps (Figure 1e). [11], (b) late fusion [12,13], (c) AV-TFN [14], (d) MTLN [16], and (e) the proposed method (TRT-Net).…”

Section: Spaito-temporal Data Representationmentioning

confidence: 99%

“…To do this, it first converts the 2D joint position extracted from the video into a matrix. It then finds the z value based on the mean value of the x-and y-coordinates [11], (b) late fusion [12,13], (c) AV-TFN [14], (d) MTLN [16], and (e) the proposed method (TRT-Net).…”

Section: Tightly Coupled Rgb Time Tensor Networkmentioning

confidence: 99%

“…To resolve this problem, a color-based data representation method that can maintain certain vector information was suggested. For an audio-visual tensor fusion network (AV-TFN) [14], a tensor fusion method was proposed, which represents the position information (x-and y-axes) of each joint of the skeleton and the audio information in the gray-and color scales, respectively, and fuses their colors to classify piano playing postures. The tensor fusion method proposed for AV-TFN is a color-based data representation method, which can be used to infer the pre-fusion vector information even after tensor fusion (Figure 1c).…”

Section: Tightly Coupled Rgb Time Tensor Networkmentioning

confidence: 99%

“…To overcome this limitation, color-based data representation methods that maintain certain vector information have emerged. However, the existing methods have a limitation in that they are optimized for audio-visual data [14] or use grayscale, which is a limited color scale for expressing 3D pose positions [15,16].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

An Analytic Method for Improving the Reliability of Models Based on a Histogram for Prediction of Companion Dogs’ Behaviors

2021

Self Cite

View full text Add to dashboard Cite

Dogs and cats tend to show their conditions and desires through their behaviors. In companion animal behavior recognition, behavior data obtained by attaching a wearable device or sensor to a dog’s body are mostly used. However, differences occur in the output values of the sensor when the dog moves violently. A tightly coupled RGB time tensor network (TRT-Net) is proposed that minimizes the loss of spatiotemporal information by reflecting the three components (x-, y-, and z-axes) of the skeleton sequences in the corresponding three channels (red, green, and blue) for the behavioral classification of dogs. This paper introduces the YouTube-C7B dataset consisting of dog behaviors in various environments. Based on a method that visualizes the Conv-layer filters in analyzable feature maps, we add reliability to the results derived by the model. We can identify the joint parts, i.e., those represented as rows of input images showing behaviors, learned by the proposed model mainly for making decisions. Finally, the performance of the proposed method is compared to those of the LSTM, GRU, and RNN models. The experimental results demonstrate that the proposed TRT-Net method classifies dog behaviors more effectively, with improved accuracy and F1 scores of 7.9% and 7.3% over conventional models.

show abstract