Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

Zheng, Jiangbin; Wang, Yile; Wang, Ge; Xia, Jun; Huang, Yufei; Zhao, Guojiang; Zhang, Yue; Li, Stan Z.

doi:10.48550/arxiv.2210.16848

Cited by 2 publications

(2 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In previous SLR works, cross-modal alignment only focuses on positive samples [15,48]. Inspired by contrastive learning [5,16,45], we construct both positive and negative samples in the same mini-batch and implement a contrastive cross-modal alignment method to ensure that similar features are closer while different are farther apart. Given that the normalized spatial features from CNN as S logits ∈ R B×T ×d , and the normalized temporal features from VAE as V logits ∈ R B×T ×d , where B denotes the number of samples.…”

Section: Contrastive Alignmentmentioning

confidence: 99%

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Zheng¹,

Wang²,

Tan³

et al. 2023

Preprint

View full text Add to dashboard Cite

Sign language recognition (SLR) is a weakly supervised task that annotates sign videos as textual glosses. Recent studies show that insufficient training caused by the lack of large-scale available sign datasets becomes the main bottleneck for SLR. Most SLR works thereby adopt pretrained visual modules and develop two mainstream solutions. The multi-stream architectures extend multi-cue visual features, yielding the current SOTA performances but requiring complex designs and might introduce potential noise. Alternatively, the advanced single-cue SLR frameworks using explicit cross-modal alignment between visual and textual modalities are simple and effective, potentially competitive with the multi-cue framework. In this work, we propose a novel contrastive visual-textual transformation for SLR, CVT-SLR, to fully explore the pretrained knowledge of both the visual and language modalities. Based on the single-cue cross-modal alignment framework, we propose a variational autoencoder (VAE) for pretrained contextual knowledge while introducing the complete pretrained language module. The VAE implicitly aligns visual and textual modalities while benefiting from pretrained contextual knowledge as the traditional contextual module. Meanwhile, a contrastive cross-modal alignment algorithm is designed to explicitly enhance the consistency constraints. Extensive experiments on public datasets (PHOENIX-2014 and PHOENIX-2014T) demonstrate that our proposed CVT-SLR consistently outperforms existing single-cue methods and even outperforms SOTA multi-cue methods. The source codes and models are available at https://github.com/binbinjiang/CVT-SLR.

show abstract

Section: Contrastive Alignmentmentioning

confidence: 99%

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Zheng¹,

Wang²,

Tan³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…More recently, some hybrid architectures (Rong et al, 2020;Ying et al, 2021;Min et al, 2022) of GNNs and transformers are emerging to capture the topological structures of molecular graphs. Additionally, given that the available labels for molecules are often expensive or incorrect (Xia et al, 2021;Tan et al, 2021;Xia et al, 2022a), the emerging self-supervised pre-training strategies (You et al, 2020;Xia et al, 2022c;Yue et al, 2022;Liu et al, 2023) on graph-structured data are promising for molecular graph data (Hu et al, 2020;Xia et al, 2023a;Gao et al, 2022), just like the overwhelming success of pre-trained language models in natural language processing community (Devlin et al, 2019;Zheng et al, 2022).…”

Section: C3 2d and 3d Graph-based Molecular Descriptorsmentioning

confidence: 99%

Why Deep Models Often Cannot Beat Non-deep Counterparts on Molecular Property Prediction?

Xia

Zhang

Zhu

et al. 2023

Preprint

View full text Add to dashboard Cite

Molecular property prediction (MPP) is a crucial task in the drug discovery pipeline, which has recently gained considerable attention thanks to advances in deep neural networks. However, recent research has revealed that deep models struggle to beat traditional non-deep ones on MPP. In this study, we benchmark 12 representative models (3 non-deep models and 9 deep models) on 14 molecule datasets. Through the most comprehensive study to date, we make the following key observations: \textbf{(\romannumeral 1)} Deep models are generally unable to outperform non-deep ones; \textbf{(\romannumeral 2)} The failure of deep models on MPP cannot be solely attributed to the small size of molecular datasets. What matters is the irregular molecule data pattern; \textbf{(\romannumeral 3)} In particular, tree models using molecular fingerprints as inputs tend to perform better than other competitors. Furthermore, we conduct extensive empirical investigations into the unique patterns of molecule data and inductive biases of various models underlying these phenomena.

show abstract

Using Context-to-Vector with Graph Retrofitting to Improve Word Embeddings

Cited by 2 publications

References 3 publications

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Why Deep Models Often Cannot Beat Non-deep Counterparts on Molecular Property Prediction?

Contact Info

Product

Resources

About