Deciphering emoji variation in courts: a social semiotic perspective

Pei, Jiamin; Cheng, Le

doi:10.1057/s41599-022-01453-5

“…Morin asserts that emojis are not yet ready to qualify as ideographic writing because, as the standardization account would predict, there is a lack of agreement over the meaning of emojis. The interpretation of isolated emojis tends to vary over participants, social groups, language/cultural as well as time and platform (Pei & Cheng, 2022). Thus, the author's cursory treatment of the category of face emojis seems to support his claim that meaning standardization for emojis, as for ideographic writing systems in general, poses an insurmountable challenge.…”

Section: Emoji Use Validates the Potential For Meaning Standardizatio...mentioning

confidence: 95%

“…Even more controversial is the author's focus on symbols in isolation when searching for standardization of meaning. When interpreting emojis, analyses often consider not only lexical but also interpersonal, social, cultural as well as legal and other technical conditions of usage (Pei & Cheng, 2022). The implication is that single-facial emojis, despite their typically high-token frequency, are not the most representative symbols on which to base an argument about the expressive power and communicative potential of all emojis and, thus, their potential as ideographic symbols.…”

Section: Emoji Use Validates the Potential For Meaning Standardizatio...mentioning

confidence: 99%

The puzzle of ideography

Morin

¹

2022

View full text Add to dashboard Cite

An ideography is a general-purpose code made of pictures that do not encode language, which can be used autonomously – not just as a mnemonic prop – to encode information on a broad range of topics. Why are viable ideographies so hard to find? I contend that self-sufficient graphic codes need to be narrowly specialized. Writing systems are only an apparent exception: at their core, they are notations of a spoken language. Even if they also encode non-linguistic information, they are useless to someone who lacks linguistic competence in the encoded language or a related one. The versatility of writing is thus vicarious: writing borrows it from spoken language. Why is it so difficult to build a fully generalist graphic code? The most widespread answer points to a learnability problem. We possess specialized cognitive resources for learning spoken language, but lack them for graphic codes. I argue in favor of a different account: what is difficult about graphic codes is not so much learning or teaching them as getting every user to learn and teach the same code. This standardization problem does not affect spoken or signed languages as much. Those are based on cheap and transient signals, allowing for easy online repairing of miscommunication, and require face-to-face interactions where the advantages of common ground are maximized. Graphic codes lack these advantages, which makes them smaller in size and more specialized.

show abstract

“…However, recent research has highlighted the limitations of subword-level tokenization, including poor generaliza- tion for out-of-vocabulary words and domains due to their reliance on a fixed vocabulary (Bostrom & Durrett, 2020;Klein & Tsarfaty, 2020;Hofmann et al, 2021;Dong et al, 2020;Xu et al, 2021). This limitation is particularly problematic for forensic NLP models used to detect covert criminal communications (CCC) that employ unusual characters and subwords for obfuscation (Bromberg et al, 2020;Pei & Cheng, 2022;Tong et al, 2017;Wagner et al, 2020;Wang et al, 2019;Zhu et al, 2019).…”

Section: Introductionmentioning

confidence: 99%

An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism

Rodriguez,

Sooksatra,

Rivas

et al. 2023

LatinX in AI at International Conference on Machine Learning 2023

1

0

View full text Add to dashboard Cite

This paper addresses the limitations of subword based models in NLP by aligning the word embedding layer of a vocabulary-rigid transformer model to a vocabulary-free one. In order to do so, a CNN is trained to mimic the word embeddings layer of a BERT model, using a sequence of byte tokens as input. The study compares cosine-based and Euclidean-based loss functions for training the student network and finds better results with cosine-based metrics. The research contributes techniques for re-training transformer embedding layers and provides insights into loss function selection. The findings have implications for developing flexible and robust NLPmodels.

show abstract

“…Morin asserts that emojis are not yet ready to qualify as ideographic writing because, as the standardization account would predict, there is a lack of agreement over the meaning of emojis. The interpretation of isolated emojis tends to vary over participants, social groups, language/cultural as well as time and platform (Pei & Cheng, 2022). Thus, the author's cursory treatment of the category of face emojis seems to support his claim that meaning standardization for emojis, as for ideographic writing systems in general, poses an insurmountable challenge.…”

mentioning

confidence: 99%

“…Even more controversial is the author's focus on symbols in isolation when searching for standardization of meaning. When interpreting emojis, analyses often consider not only lexical but also interpersonal, social, cultural as well as legal and other technical conditions of usage (Pei & Cheng, 2022). The implication is that single-facial emojis, despite their typically high-token frequency, are not the most representative symbols on which to base an argument about the expressive power and communicative potential of all emojis and, thus, their potential as ideographic symbols.…”

mentioning

confidence: 99%

Emoji use validates the potential for meaning standardization among ideographic symbols

Feldman

2023

Behav Brain Sci

0

View full text Add to dashboard Cite

Technological innovations for online communication reduce the impact of signal transience on meaning standardization while boosting access to reliable patterning across multiple linguistic and nonlinguistic contexts – both asynchronous and synchronous. We classify emojis as ideographic symbols, examine their interdependence with surrounding words when reading/writing, and argue that emoji use validates the potential for meaning standardization in ideographs.

show abstract

Deciphering emoji variation in courts: a social semiotic perspective

Cited by 7 publications

References 26 publications

The puzzle of ideography

The puzzle of ideography

An Empirical Analysis Towards Replacing Vocabulary-Rigid Embeddings by a Vocabulary-Free Mechanism

Emoji use validates the potential for meaning standardization among ideographic symbols

Contact Info

Product

Resources

About