BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

Yong, Zheng-Xin; Schoelkopf, Hailey; Muennighoff, Niklas; Aji, Alham Fikri; Adelani, David Ifeoluwa; Almubarak, Khalid; Bari, M Saiful; Sutawika, Lintang; Kasai, Jungo; Baruwa, Ahmed; Winata, Genta Indra; Biderman, Stella; Radev, Dragomir; Nikoulina, Vassilina

doi:10.48550/arxiv.2212.09535

Cited by 4 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LLMs possess multilingual capabilities that enable them to address language barriers, accommodate low-resource languages, and exhibit promising performance even on unseen languages (Yong et al, 2022). To enhance accessibility, the development and adoption of open-source multilingual models, such as BLOOM (Scao et al, 2022), should be encouraged, thereby facilitating the utilization of LLMs in educational applications across diverse linguistic contexts.…”

Section: Discussionmentioning

confidence: 99%

Strategies for developing a Conversational Speech Dataset for Text-To-Speech Synthesis

Adigwe¹,

Klabbers²

2022

Interspeech 2022

View full text Add to dashboard Cite

There have been many efforts to improve the quality of speech synthesis systems in conversational AI. Although state-of-theart systems are capable of producing natural-sounding speech, the generated speech often lacks prosodic variation and is not always suited to the task. In this paper, we examine dialogue data collection methods to use as training data for our acoustic models. We collect speech using three different setups: (1) Random read-aloud sentences; (2) Performed dialogues; (3) Semi-Spontaneous dialogues. We analyze prosodic and textual properties of the data collected in these setups and make some recommendations to collect data for speech synthesis in conversational AI settings.

show abstract

Section: Discussionmentioning

confidence: 99%

Strategies for developing a Conversational Speech Dataset for Text-To-Speech Synthesis

Adigwe¹,

Klabbers²

2022

Interspeech 2022

View full text Add to dashboard Cite

show abstract

“…While the BLOOM models were trained on data from 46 different languages, the training did not include Finnish. Prior work has investigated extending smaller BLOOM models to new languages not included during pretraining (Yong et al, 2022) and found parameter-efficient finetuning methods and (to a lesser degree) continued pretraining to be effective approaches. Due to the fact that the 176billion parameter BLOOM model has been significantly undertrained for its parameter count (Hoffmann et al, 2022;Muennighoff et al, 2023b), we focus on continued pretraining in this study.…”

Section: Modelsmentioning

confidence: 99%

“…Yle Archives of the national public broadcasting et al, 2022a). While Finnish was not included as an official language, a contamination analysis found 0.03% of ROOTS to be Finnish (Muennighoff et al, 2022). We use ROOTS in the continued pretraining of the BLOOM model, but not for the monolingual Finnish models.…”

Section: Data Sourcesmentioning

confidence: 99%

FinGPT: Large Generative Models for a Small Language

Luukkonen,

Komulainen,

Luoma

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Large language models (LLMs) excel in many tasks in NLP and beyond, but most open models have very limited coverage of smaller languages and LLM work tends to focus on languages where nearly unlimited data is available for pretraining. In this work, we study the challenges of creating LLMs for Finnish, a language spoken by less than 0.1% of the world population. We compile an extensive dataset of Finnish combining web crawls, news, social media and eBooks. We pursue two approaches to pretrain models: 1) we train seven monolingual models from scratch (186M to 13B parameters) dubbed FinGPT, 2) we continue the pretraining of the multilingual BLOOM model on a mix of its original training data and Finnish, resulting in a 176 billion parameter model we call BLUUMI. For model evaluation, we introduce FIN-bench, a version of BIG-bench with Finnish tasks. We also assess other model qualities such as toxicity and bias. Our models and tools are openly available at https://turkunlp.org/gpt3-finnish.

show abstract

“…Large autoregressive language models (LLMs) such as GPT [3], ChatGPT [12], PaLM [4], or BLOOM [17] have the potential to address both of these shortcomings. Due to being pre-trained on huge amounts of text as well as due to emergent effects resulting from the model size [16], LLMs often have a better zero-shot performance compared to PLMs such as BERT and are also more robust concerning unseen examples [3].…”

Section: Introductionmentioning

confidence: 99%

Supervised Contrastive Learning for Product Matching

Peeters

Bizer

2022

Companion Proceedings of the Web Conference 2022

View full text Add to dashboard Cite

Entity Matching is the task of deciding if two entity descriptions refer to the same real-world entity. State-of-the-art entity matching methods often rely on fine-tuning Transformer models such as BERT or RoBERTa. Two major drawbacks of using these models for entity matching are that (i) the models require significant amounts of fine-tuning data for reaching a good performance and (ii) the fine-tuned models are not robust concerning out-of-distribution entities. In this paper, we investigate using ChatGPT for entity matching as a more robust, training data-efficient alternative to traditional Transformer models. We perform experiments along three dimensions: (i) general prompt design, (ii) in-context learning, and (iii) provision of higher-level matching knowledge. We show that ChatGPT is competitive with a fine-tuned RoBERTa model, reaching an average zero-shot performance of 83% F1 on a challenging matching task on which RoBERTa requires 2000 training examples for reaching a similar performance. Adding in-context demonstrations to the prompts further improves the F1 by up to 5% even using only a small set of 20 handpicked examples. Finally, we show that guiding the zero-shot model by stating higher-level matching rules leads to similar gains as providing in-context examples.

show abstract

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

Cited by 4 publications

References 0 publications

Strategies for developing a Conversational Speech Dataset for Text-To-Speech Synthesis

Strategies for developing a Conversational Speech Dataset for Text-To-Speech Synthesis

FinGPT: Large Generative Models for a Small Language

Supervised Contrastive Learning for Product Matching

Contact Info

Product

Resources

About