2022
DOI: 10.1162/tacl_a_00461
|View full text |Cite
|
Sign up to set email alerts
|

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

Abstract: Most widely used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, token-free models that operate directly on raw text (bytes or characters) have many benefits: They can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pipelines. Because byte or character sequences are longer than token sequences, past work on token-free models has of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
54
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 133 publications
(125 citation statements)
references
References 31 publications
1
54
0
Order By: Relevance
“…So the code point 353 of the letter "š" is translated into two bytes 197 and 161 while the letter "s" retains byte 115. [8] showed better results using a transformer model ByT5 at this byte-level tokens rather than on characters. Inspired by their success on transliteration and noisy text tasks, we also use the same byte-level tokenization.…”
Section: Tokensmentioning
confidence: 93%
See 4 more Smart Citations

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,
Lukoševičius,
Kapočiūtė-Dzikienė
et al. 2022
Preprint
“…So the code point 353 of the letter "š" is translated into two bytes 197 and 161 while the letter "s" retains byte 115. [8] showed better results using a transformer model ByT5 at this byte-level tokens rather than on characters. Inspired by their success on transliteration and noisy text tasks, we also use the same byte-level tokenization.…”
Section: Tokensmentioning
confidence: 93%
“…Three years later, now there are plenty of similarly pre-trained publicly available models (e.g., at HuggingFace transformers library [94]). We also build our work on top of one such pre-trained ByT5 [8] model.…”
Section: Transformer Modelsmentioning
confidence: 99%
See 3 more Smart Citations

Correcting diacritics and typos with a ByT5 transformer model

Stankevičius,
Lukoševičius,
Kapočiūtė-Dzikienė
et al. 2022
Preprint