Distinguishing Japanese Non-standard Usages from Standard Ones

Aoki, Tatsuya; Sasano, Ryohei; Takamura, Hiroya; Okumura, Manabu

doi:10.18653/v1/d17-1246

Cited by 8 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At the end of training, two matrices are produced, one representing word embeddings and the other representing context embeddings for each and every vocabulary word. While word embeddings have been used as the output of Skipgram in many previous studies, little attention has been paid to the context embeddings and the usefulness of these vectors in performing lexical semantic tasks (Levy et al, 2015;Melamud et al, 2015;Aoki et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Querying Word Embeddings for Similarity and Relatedness

Asr¹,

Zinkov²,

Jones³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Word embeddings obtained from neural network models such as Word2Vec Skipgram have become popular representations of word meaning and have been evaluated on a variety of word similarity and relatedness norming data. Skipgram generates a set of word and context embeddings, the latter typically discarded after training. We demonstrate the usefulness of context embeddings in predicting asymmetric association between words from a recently published dataset of production norms (Jouravlev and McRae, 2016). Our findings suggest that humans respond with words closer to the cue within the context embedding space (rather than the word embedding space), when asked to generate thematically related words.

show abstract

Section: Introductionmentioning

confidence: 99%

Querying Word Embeddings for Similarity and Relatedness

Asr¹,

Zinkov²,

Jones³

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

show abstract

“…For each word in a text, if the word's thematic coherence to the text is lower than a given threshold, the word will be seen as jargon. [3]- [5] calculate each word's occurrence probability on both jargon corpus and regular corpus, and then utilize the difference to determine whether the word is jargon. Some research attempts to conduct jargon detection using implicit features.…”

Section: A Jargon Detectionmentioning

confidence: 99%

“…Then, we combined these UGTCs with the Danmaku corpus to form a textual corpus of more than 1.95 million items. We used the open-source Chinese word splitting tool jieba 5 to split the words. Our annotated jargon words were added to the splitting dictionary to ensure that the splitting tool would not split these jargon.…”

Section: Crucial Modelsmentioning

confidence: 99%

“…Therefore, several studies have begun to focus on automatic interpretation. For example, [4] explained each jargon word by finding its hypernym and hyponym based on deep learning techniques, [1] proposed a method that could link the jargon word to be explained with the known jargon words, and [3], [5] explained jargon by utilizing lexical context. As exploratory studies, these methods are proposed for specific domains or specific categories of jargon, such as underground forums, illegal trades, and specific subcultures.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

JargonFM: A Framework with Multiple Interpretation Modes for Jargon Understanding in Online Communities

Guan¹,

Zhang²,

Liu³

et al. 2022

Preprint

View full text Add to dashboard Cite

<p>Jargon words are commonly used in the communication of online communities. These words are characterized by special and implicit meanings that can only be comprehended by a small group of users, which brings challenges to community regulation and user engagement. For this problem, we present JargonFM, a framework with multiple interpretation modes for jargon understanding in online communities. JargonFM is designed based on the scientific explanation framework and supports three interpretation modes: jargon category prediction based on a Jargon Classifier, similar word identification based on a Jargon Synonym Selector, and representative text selection based on an Example Sentence Selector. We also implemented a jargon interpreter to demonstrate the usage and usefulness of our interpretation framework. Automatic and human evaluations suggest that JargonFM can explain jargon words more accurately and more efficiently than the existing interpretation methods, leading to its wide acceptance among the evaluation participants.</p>

show abstract

“…However, these methods represent first-stage research [2]. Furthermore, Aoki et al [18] detected nonstandard word usage involving definitions that differed from their original meaning. These words were not limited to use in crime-related contexts, and it is conceivable that crime-related codewords function with other methods to conceal a given message.…”

Section: Related Work On Codeword Detectionmentioning

confidence: 99%

Codeword Detection, Focusing on Differences in Similar Words Between Two Corpora of Microblogs

Hada¹,

Sei²,

Tahara³

et al. 2021

AETiC

View full text Add to dashboard Cite

Recently, the use of microblogs in drug trafficking has surged and become a social problem. A common method applied by cyber patrols to repress crimes, such as drug trafficking, involves searching for crime-related keywords. However, criminals who post crime-inducing messages maximally exploit “codewords” rather than keywords, such as enjo kosai, marijuana, and methamphetamine, to camouflage their criminal intentions. Research suggests that these codewords change once they gain popularity; thus, effective codeword detection requires significant effort to keep track of the latest codewords. In this study, we focused on the appearance of codewords and those likely to be included in incriminating posts to detect codewords with a high likelihood of inclusion in incriminating posts. We proposed new methods for detecting codewords based on differences in word usage and conducted experiments on concealed-word detection to evaluate the effectiveness of the method. The results showed that the proposed method could detect concealed words other than those in the initial list and to a better degree than the baseline methods. These findings demonstrated the ability of the proposed method to rapidly and automatically detect codewords that change over time and blog posts that instigate crimes, thereby potentially reducing the burden of continuous codeword surveillance.

show abstract

Distinguishing Japanese Non-standard Usages from Standard Ones

Cited by 8 publications

References 11 publications

Querying Word Embeddings for Similarity and Relatedness

Querying Word Embeddings for Similarity and Relatedness

JargonFM: A Framework with Multiple Interpretation Modes for Jargon Understanding in Online Communities

Codeword Detection, Focusing on Differences in Similar Words Between Two Corpora of Microblogs

Contact Info

Product

Resources

About