mGPT: Few-Shot Learners Go Multilingual

Shliazhko, Oleh; Fenogenova, Alena; Tikhonova, Maria A.; Mikhailov, Vladislav; Козлова, А. И.; Shavrina, Tatiana

doi:10.48550/arxiv.2204.07580

Cited by 10 publications

(10 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, to facilitate a comprehensive comparison, we computed "word surprisal" for the stimuli text for the corpus of eye-tracking data on Chinese reading (Zhang et al, 2022). The two state-of-the-art large language models (LLMs) were taken to estimate word surprisal: Chinese BERT (Cui et al, 2021) and multilingual GPT (Shliazhko et al, 2022). We can leverage pre-trained LLMs to estimate word probabilities and subsequently obtain word surprisals.…”

Section: A Attention Types In Transformersmentioning

confidence: 99%

An interpretable semantic similarity for predicting eye movements in reading: Approach comparison

Sun¹

2022

Preprint

View full text Add to dashboard Cite

Expectations or predictions about upcoming content play an important role during language comprehension and processing. Semantic similarity as a metric has been used to predict the experimental data on language comprehension and processing. Some approaches have been proposed to compute contextual semantic similarity to predict how words in a context are processed. However, these approaches can be improved. This study uses a massive and naturalistic discourse as stimuli for collecting data on eye-movement in reading. It proposes asimple but effective approach to computing contextual semantic similarity. In order to test the efficiency of the new approach, we compare it to the recently developed cosine method and the Euclidean method. This comparison reveals that our approach can make good predictions about fixation durations on reading and that it outperforms the two aforementioned approaches. To our knowledge, this is the first study to compare a number of approaches on contextual semantic similarity that are used to process naturalistic discourse in the fields of cogni-tion, psycholinguistics and neuroscience. The findings of this study aretherefore of significance to the acquisition of a better understanding ofhow humans process words in a real-world context and how they makepredictions in language comprehension and processing. This study creates an interpretable but more effective approach for computing contextual semantic similarity, which allows further explorations of the data on naturalistic discourse reading, language comprehension and neuroscience.

show abstract

Section: A Attention Types In Transformersmentioning

confidence: 99%

An interpretable semantic similarity for predicting eye movements in reading: Approach comparison

Sun¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Fortunately, such databases and computational tools are available. For instance, multi-lingual large language models (LLMs, e.g., m-BERT, Devlin et al (2018); XLM, (Conneau et al, 2019); mGPT, (Shliazhko et al, 2022)) are capable of processing and understanding text in multiple languages. Multilingual LLMs can estimate word surprisal precisely respectively for multiple languages.…”

Section: Introductionmentioning

confidence: 99%

Optimizing Predictive Metrics for Human Reading Behavior

Sun

2023

Preprint

View full text Add to dashboard Cite

Computational models and metrics, including measures like surprisal and semantic relevance, have been developed to accurately predict and explain language comprehension and processing. However, their efficacy is hindered by their inadequate integration of contextual information. Drawing inspiration from the attention mechanism in transformers and human forgetting mechanism, this study introduces an attention-aware method that thoroughly incorporates contextual information, updating surprisal and semantic relevance into attention-aware metrics respectively. Furthermore, by employing the quantum superposition principle, the study proposes an enhanced approach for integrating and encoding diverse information sources based on the two attention-aware metrics. Both attention-aware and enhanced metrics demonstrate superior effectiveness in comparison to the currently available metrics, leading to improved predictions of eye-movements during naturalistic discourse reading across 13 languages. The proposed approaches are fairly capable of facilitating simulation and evaluation of existing reading models and language processing theories. The metrics computed by the proposed approaches are highly interpretable and exhibit cross-language generalizations in predicting language comprehension. The innovative computational methods proposed in this study hold the great potential to enhance our under-standing of human working memory mechanisms, human reading behavior and cognitive modeling in language processing. Moreover, they have the capacity to revolutionize ongoing research in computational cognition for language processing, offering valuable insights for computational neuroscience, quantum cognition and optimizing the design of AI systems.

show abstract

“…In March 2022, Google's DeepMind released Chinchilla with 70 Bp [57], BigScience published tr11-176-ml (only available on the github repository). April brought along PaLM with 540 Bp by Google [83], CodeGen using 16 Bp by Salesforce, VLM-4 by Salesforce [88], the 200 Bp Luminous by Aleph Alpha [52], the 13 Bp mGPT from Sber [107], and the 10 Bp Noor from TII [114]. The most groundbreaking model in April was called Flamingo produced by Google's DeepMind because it combined a 70 Bp language model with a 10 Mb image model [3].…”

mentioning

confidence: 99%

The rapid competitive economy of machine learning development: a discussion on the social risks and benefits

Walter

2023

AI Ethics

View full text Add to dashboard Cite

Research in artificial intelligence (AI) has started in the twentieth century but it was not until 2012 that modern models of artificial neural networks aided the machine learning process considerably so that in the past ten years, both computer vision as well as natural language processing have become increasingly better. AI developments have accelerated rapidly, leaving open questions about the potential benefits and risks of these dynamics and how the latter might be managed. This paper discusses three major risks, all lying in the domain of AI safety engineering: the problem of AI alignment, the problem of AI abuse, and the problem of information control. The discussion goes through a short history of AI development, briefly touching on the benefits and risks, and eventually making the case that the risks might potentially be mitigated through strong collaborations and awareness concerning trustworthy AI. Implications for the (digital) humanities are discussed.

show abstract

mGPT: Few-Shot Learners Go Multilingual

Cited by 10 publications

References 0 publications

An interpretable semantic similarity for predicting eye movements in reading: Approach comparison

An interpretable semantic similarity for predicting eye movements in reading: Approach comparison

Optimizing Predictive Metrics for Human Reading Behavior

The rapid competitive economy of machine learning development: a discussion on the social risks and benefits

Contact Info

Product

Resources

About