2019
DOI: 10.5281/zenodo.3457707
|View full text |Cite
|
Sign up to set email alerts
|

DeepCut: A Thai word tokenization library using Deep Neural Network.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 0 publications
0
1
0
Order By: Relevance
“…The table gives total number of words (#w) and words per sentences (#w/s) for each language. Thai was tokenized with Deepcut(Kittinaradorn et al, 2019).…”
mentioning
confidence: 99%
“…The table gives total number of words (#w) and words per sentences (#w/s) for each language. Thai was tokenized with Deepcut(Kittinaradorn et al, 2019).…”
mentioning
confidence: 99%
“…Deepcut [10] and Attacut [11] are two remarkable tokenizers in this thesis. Deepcut is a state-of-the-art technique using characters embedding with 1d-convolutional network and predicting the first character of words in a sentence while Attacut proposed using syllable boundaries instead of using word boundaries.…”
Section: Tokenization Techniquementioning
confidence: 94%
“…We curated two corpora with 27M words/145M letters from Thai Wikipedia and 69M words/330M letters from Pantip (Thai Q&A forum). For each corpus, we did word tokenization using DeepCut (Kittinaradorn et al, 2019) and trained word-based n-gram models using KenLM (Heafield, 2011). The final LM is obtained by n-gram interpolation.…”
Section: Experimental Setupsmentioning
confidence: 99%