2020
DOI: 10.1371/journal.pone.0234214
|View full text |Cite|
|
Sign up to set email alerts
|

Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets

Abstract: Symbolic sequential data are produced in huge quantities in numerous contexts, such as text and speech data, biometrics, genomics, financial market indexes, music sheets, and online social media posts. In this paper, an unsupervised approach for the chunking of idiomatic units of sequential text data is presented. Text chunking refers to the task of splitting a string of textual information into non-overlapping groups of related units. This is a fundamental problem in numerous fields where understanding the re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 45 publications
0
3
0
Order By: Relevance
“…This method is simple but can easily produce multiple useless words. The statistics-based word classification method works on the basis of n-grams (Dario et al, 2020). The method should first prepare a complete corpus for training and extract the features of the text in accordance with the number of occurrences between adjacent characters.…”
Section: Construct a Vocabulary Of Police Alarm Addressesmentioning
confidence: 99%
See 1 more Smart Citation
“…This method is simple but can easily produce multiple useless words. The statistics-based word classification method works on the basis of n-grams (Dario et al, 2020). The method should first prepare a complete corpus for training and extract the features of the text in accordance with the number of occurrences between adjacent characters.…”
Section: Construct a Vocabulary Of Police Alarm Addressesmentioning
confidence: 99%
“…It is not only one of the most representative algorithms of deep learning, but also a core algorithm in the image recognition field. It is widely used in computer vision, natural language processing, remote sensing, atmospheric science, and other fields (Dario et al, 2020).…”
Section: Model Trainingmentioning
confidence: 99%
“…The current literature shows the importance of disseminating information during a disaster [ 1 , 2 , 7 , 8 , 19 , 20 ], but there is still research to be done in terms of the speed, accuracy and diffusion rates that can be measured during a disaster. This work addresses the following research questions: Which electronic media based information is useful and accurate during a disaster and how do we identify it?…”
Section: Introductionmentioning
confidence: 99%