Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing 2017
DOI: 10.18653/v1/d17-1246
|View full text |Cite
|
Sign up to set email alerts
|

Distinguishing Japanese Non-standard Usages from Standard Ones

Abstract: We focus on non-standard usages of common words on social media. In the context of social media, words sometimes have other usages that are totally different from their original. In this study, we attempt to distinguish non-standard usages on social media from standard ones in an unsupervised manner. Our basic idea is that nonstandardness can be measured by the inconsistency between the expected meaning of the target word and the given context. For this purpose, we use context embeddings derived from word embe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…At the end of training, two matrices are produced, one representing word embeddings and the other representing context embeddings for each and every vocabulary word. While word embeddings have been used as the output of Skipgram in many previous studies, little attention has been paid to the context embeddings and the usefulness of these vectors in performing lexical semantic tasks (Levy et al, 2015;Melamud et al, 2015;Aoki et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…At the end of training, two matrices are produced, one representing word embeddings and the other representing context embeddings for each and every vocabulary word. While word embeddings have been used as the output of Skipgram in many previous studies, little attention has been paid to the context embeddings and the usefulness of these vectors in performing lexical semantic tasks (Levy et al, 2015;Melamud et al, 2015;Aoki et al, 2017).…”
Section: Introductionmentioning
confidence: 99%
“…For each word in a text, if the word's thematic coherence to the text is lower than a given threshold, the word will be seen as jargon. [3]- [5] calculate each word's occurrence probability on both jargon corpus and regular corpus, and then utilize the difference to determine whether the word is jargon. Some research attempts to conduct jargon detection using implicit features.…”
Section: A Jargon Detectionmentioning
confidence: 99%
“…Then, we combined these UGTCs with the Danmaku corpus to form a textual corpus of more than 1.95 million items. We used the open-source Chinese word splitting tool jieba 5 to split the words. Our annotated jargon words were added to the splitting dictionary to ensure that the splitting tool would not split these jargon.…”
Section: Crucial Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, these methods represent first-stage research [2]. Furthermore, Aoki et al [18] detected nonstandard word usage involving definitions that differed from their original meaning. These words were not limited to use in crime-related contexts, and it is conceivable that crime-related codewords function with other methods to conceal a given message.…”
Section: Related Work On Codeword Detectionmentioning
confidence: 99%