Sketchformer: Transformer-Based Representation for Sketched Structure

Ribeiro, Leo Sampaio Ferraz; Bui, Tien D.; Collomosse, John; Ponti, Moacir Antonelli

doi:10.1109/cvpr42600.2020.01416

Cited by 53 publications

(14 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This method achieves good recognition performance but endures a complex training process and a huge number of model parameters. Ribeiro et al 11 constructed a sketch recognition network based on the transformer structure that encodes sketches into feature vectors and uses the stroke sequence of sketches as input to the network, enhancing the network's ability to learn the stroke sequence of complex sketches. Jain et al 12 further designed the TransSketchNet in accordance with the transformer structure, which improves the network's ability to extract more valuable features by entirely using the stroke sequences and the attention mechanism.…”

Section: Learning-based Methods For Sketch Recognitionmentioning

confidence: 99%

“…This method achieves good recognition performance but endures a complex training process and a huge number of model parameters. Ribeiro et al 11 . constructed a sketch recognition network based on the transformer structure that encodes sketches into feature vectors and uses the stroke sequence of sketches as input to the network, enhancing the network’s ability to learn the stroke sequence of complex sketches.…”

Section: Related Workmentioning

confidence: 99%

“…On the basis of existing approaches, 6 , 7 , 19 , 23 – 47 we introduce the dual-attention mechanism followed by the CNN backbone, so that the proposed model can fully use the sketch features in the learning processes. Rather than using the transformer structure 10 – 12 or embedding the attention mechanism into certain modules, we adopt the attention mechanism itself directly to keep the lightweight nature of the model.…”

Section: Related Workmentioning

confidence: 99%

“…However, the aforementioned methods rarely fully utilize the extracted sketch features. To address this issue, researchers have incorporated attention mechanisms 9 into networks, such as multi-graph transformer (MGT), 10 SketchFormer, 11 and TransSketchNet 12 . Although these methods focus on the features through attention mechanisms, they all require a complex computational process.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Light-SRNet: a lightweight dual-attention feature fusion network for hand-drawn sketch recognition

Hou

Rong

2023

J. Electron. Imag.

View full text Add to dashboard Cite

.Free-hand sketches play a vital role in graphically portraying ideas and concepts in image recognition systems. Most recently proposed learning-based sketch recognition methods have achieved marked progress in recognition accuracy, but they rarely optimize the use of the sparsity features of sketch images. Although several attention-based sketch recognition models have been presented, they endure complex computations and large model sizes. To address these challenges, we present a lightweight convolutional neural network called Light-SRNet based on a dual-attention mechanism to improve the accuracy of sketch recognition while retaining its lightweight nature. In the proposed model, we introduced both the spatial and channel attention mechanisms into the feature extraction network to highlight more discriminative feature representations to enhance its powerful sketch recognition ability. We compared the proposed Light-SRNet with its competitors on the TU-Berlin dataset, Sketchy dataset, and QuickDrawExtended dataset. Extensive experimental results show that Light-SRNet achieves a recognition accuracy of 73.14%, which is comparable to other similar sketch recognition techniques, while requiring only about a quarter of the model parameters.

show abstract

Section: Learning-based Methods For Sketch Recognitionmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Light-SRNet: a lightweight dual-attention feature fusion network for hand-drawn sketch recognition

Hou

Rong

2023

J. Electron. Imag.

View full text Add to dashboard Cite

show abstract

“…The ASR performance is reported to be better by combining the CTC loss with the attention mechanism [28] or using the Transformer structure [14,15]. In particular, the Transformer structure, which is originally designed to handle the natural language processing (NLP) problems [29,30], has been successfully utilized in several other domains, such as computer vision (CV) [31,32], and speech-related tasks including text to speech (TTS) [33,34,18,19], voice conversion (VC) [35], and ASR [12,13].…”

Section: Related Workmentioning

confidence: 99%

Text-Conditioned Transformer for Automatic Pronunciation Error Detection

Zhang,

Wang,

Yang

2020

Preprint

View full text Add to dashboard Cite

Automatic pronunciation error detection (APED) plays an important role in the domain of language learning. As for the previous ASR-based APED methods, the decoded results need to be aligned with the target text so that the errors can be found out. However, since the decoding process and the alignment process are independent, the prior knowledge about the target text is not fully utilized.In this paper, we propose to use the target text as an extra condition for the Transformer backbone to handle the APED task. The proposed method can output the error states with consideration of the relationship between the input speech and the target text in a fully end-to-end fashion. Meanwhile, as the prior target text is used as a condition for the decoder input, the Transformer works in a feed-forward manner instead of autoregressive in the inference stage, which can significantly boost the speed in the actual deployment. We set the ASR-based Transformer as the baseline APED model and conduct several experiments on the L2-Arctic dataset. The results demonstrate that our approach can obtain 8.4% relative improvement on the F 1 score metric.

show abstract