Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining 2021
DOI: 10.1145/3447548.3467215
|View full text |Cite
|
Sign up to set email alerts
|

Domain-oriented Language Modeling with Adaptive Hybrid Masking and Optimal Transport Alignment

Abstract: Motivated by the success of pre-trained language models such as BERT in a broad range of natural language processing (NLP) tasks, recent research efforts have been made for adapting these models for different application domains. Along this line, existing domainoriented models have primarily followed the vanilla BERT architecture and have a straightforward use of the domain corpus. However, domain-oriented tasks usually require accurate understanding of domain phrases, and such fine-grained phrase-level knowle… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…Recently, the interest on the Wasserstein distance [36] and the associated Optimal Transport theory [37] have been growing due to their successful application in many domains (e.g., imaging, signal processing and analysis, natural language process/generation, human learning, optimization, etc.) [38,39,40,41,42,43,44]. Very briefly, the Wasserstein distance allows to measure the difference between two probability distributions, α and α , independently on the values and the nature of their supports (they can be both discrete, continuous, or one continuous and one discrete).…”
Section: Bora 3 : Bo Over the Probability Simplex Via Wasserstein Se ...mentioning
confidence: 99%
“…Recently, the interest on the Wasserstein distance [36] and the associated Optimal Transport theory [37] have been growing due to their successful application in many domains (e.g., imaging, signal processing and analysis, natural language process/generation, human learning, optimization, etc.) [38,39,40,41,42,43,44]. Very briefly, the Wasserstein distance allows to measure the difference between two probability distributions, α and α , independently on the values and the nature of their supports (they can be both discrete, continuous, or one continuous and one discrete).…”
Section: Bora 3 : Bo Over the Probability Simplex Via Wasserstein Se ...mentioning
confidence: 99%