Language Models are Few-shot Multilingual Learners

Winata, Genta Indra; Madotto, Andrea; Lin, Zhaojiang; Liu, Rosanne; Yosinski, Jason; Fung, Pascale

doi:10.48550/arxiv.2109.07684

Cited by 3 publications

(3 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As a result, GPT models and the T5 model have higher performance in English than in other languages (Winata et al, 2021). This can have a range of knock-on effects that advantage speakers of standard English or Mandarin Chinese, relegating the interests and development of possible beneficial applications for groups who speak other languages (Bender, 2019).…”

Section: Problemmentioning

confidence: 99%

“…Current state-of-the-art LMs produce higher quality predictions when prompted in English or Mandarin Chinese (Brown et al, 2020;Du, 2021;Fedus et al, 2021;Rosset, 2020). While it has been shown that in some languages, few-shot training and fine-tuning can improve performance in GPT models (Brown et al, 2020) and the T5 model (Raffel et al, 2020), the performance in non-English languages remained lower than the performance in English (Winata et al, 2021). It may be the case that the architecture of current LMs is particularly well-suited to English, and less well suited to other languages (Bender, 2011;Hovy and Spruit, 2016;Ruder, 2020).…”

Section: Examplesmentioning

confidence: 99%

See 1 more Smart Citation

Ethical and social risks of harm from Language Models

Weidinger¹,

Mellor²,

Rauh³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary literature from computer science, linguistics, and social sciences.

show abstract

Section: Problemmentioning

confidence: 99%

Section: Examplesmentioning

confidence: 99%

Ethical and social risks of harm from Language Models

Weidinger¹,

Mellor²,

Rauh³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Scaling language models using this architecture has proven to be a powerful and general strategy for improving generalization. This has led to the emergence of multi-task (Radford et al, 2019) and few-shot (Brown et al, 2020;Winata et al, 2021) models leveraging scale and compute (Sanh et al, 2021;Raffel et al, 2020). This plot compares the performance of three different models with different sizes (Text+Chem T5-base, Text+Chem T5-small, MolT5base, MolT5-small, T5-base, and T5-small) on the task of converting SMILES to captions, using six different metrics: BLUE-2, BLEU-4, Rouge-1, Rouge-2, Rouge-L, and Meteor.…”

Section: Introductionmentioning

confidence: 99%

Unifying Molecular and Textual Representations via Multi-task Language Modelling

Christofidellis¹,

Giannone²,

Born³

et al. 2023

Preprint

View full text Add to dashboard Cite

The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular design and synthesis planning. These new methods have the potential to optimize laboratory operations and fuel a new era of data-driven automation in scientific discovery. However, specialized models are still typically required for each task, leading to the need for problem-specific fine-tuning and neglecting task interrelations. The main obstacle in this field is the lack of a unified representation between natural language and chemical representations, complicating and limiting human-machine interaction.Here, we propose a multi-domain, multi-task language model to solve a wide range of tasks in both the chemical and natural language domains. By leveraging multi-task learning, our model can handle chemical and natural language concurrently, without requiring expensive pre-training on single domains or task-specific models. Interestingly, sharing weights across domains remarkably improves our model when benchmarked against state-of-the-art baselines on single-domain and cross-domain tasks. In particular, sharing information across domains and tasks gives rise to large improvements in cross-domain tasks, the magnitude of which increase with scale, as measured by more than a dozen of relevant metrics. Our work suggests that such models can robustly and efficiently accelerate discovery in physical sciences by superseding problem-specific fine-tuning and enhancing human-model interactions.

show abstract

Text2Price: Adeep Learning Model for Predicting Electronic Product Prices from Descriptive Text Sequences

Al-Majmar,

Alsubari,

Marhabi

2024

Preprint

View full text Add to dashboard Cite

This study investigates deep learning models for predicting electronic product prices through text sequence (Text2Price). The study examined the performance of these models in terms of price prediction, the factors influencing predictions, the model's comprehension of numerical and expressive text, and the efficacy of the developed price prediction model. The primary aim is to create a model skilled at forecasting product prices using textual sequences containing product names, brands, and features. Methodologically, the research employs the T5-BESD model, a transformer-based architecture trained on a dataset of 22,000 electronic products from the Amazon. Data preprocessing involves cleaning and integrating features to create text sequence. In the model definition phase, a custom neural network architecture, T5Regressor, predicts product prices from textual descriptions. This architecture comprises a transformer-based language model (T5) and a linear regression layer. The T5 model comprehends and encodes the input text, while the linear regression layer predicts the numerical output (price). The linear regression layer involves a transformation with a weight matrix W and bias vector b. Additionally, the L1 loss, used for training, measures the absolute difference between the predicted and true values. In the initialization and setup phase, critical components, including the optimizer (Adam_W), learning rate scheduler, and loss function, are initialized. The learning rate scheduler dynamically adjusts the learning rate during training, incorporating a warm-up phase. The results demonstrate a consistent improvement in accuracy from 38.48–54.86% over five epochs, with the test accuracy reaching 52.38%. The mean squared error decreases from 45057.29 to 19783.88, indicating enhanced prediction accuracy, and the mean absolute error drops from 66.87 to 47.34, reflecting reduced disparities between the predicted and actual values. The research concludes by providing insights into the effectiveness of the T5-BESD model's effectiveness, emphasizing the importance of comprehensive data and suggesting avenues for improvement.

show abstract

Language Models are Few-shot Multilingual Learners

Cited by 3 publications

References 0 publications

Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

Unifying Molecular and Textual Representations via Multi-task Language Modelling

Text2Price: Adeep Learning Model for Predicting Electronic Product Prices from Descriptive Text Sequences

Contact Info

Product

Resources

About