2017
DOI: 10.1016/j.asoc.2017.05.065
|View full text |Cite
|
Sign up to set email alerts
|

Smoothed n-gram based models for tweet language identification: A case study of the Brazilian and European Portuguese national varieties

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0
2

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(10 citation statements)
references
References 9 publications
0
8
0
2
Order By: Relevance
“…An n-gram language model predicts the probability of a given n-gram within any sequence of words in the language. It is widely used in text mining [15,16], including in the legal domain [19]. An n-gram is a contiguous sequence of n items from a given sequence of text.…”
Section: Language Modelmentioning
confidence: 99%
“…An n-gram language model predicts the probability of a given n-gram within any sequence of words in the language. It is widely used in text mining [15,16], including in the legal domain [19]. An n-gram is a contiguous sequence of n items from a given sequence of text.…”
Section: Language Modelmentioning
confidence: 99%
“…Performance evaluation of all the five algorithms named GD, GDM, GDA, GDX, and LM is carried out using a confusion matrix as shown in Table 2 (Dhaoui et al, 2017;Castro et al, 2017;Moraes et al, 2013) for binary datasets. The effectiveness of all the five algorithms used for updating the parameters of ANN is measured using precision, recall, f -score, accuracy, training time, and MSE as performance metrics.…”
Section: Performance Measuresmentioning
confidence: 99%
“…Further, statistical analysis of all the five algorithms is performed using Wilcoxon signed-rank test. (Castro et al, 2017) using the confusion matrix presented in Table 2.…”
Section: Performance Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, there are now research efforts where the authors try to solve a specific language searching problem [6] - [13], but there is no complete software architecture easily customizable for different search applications. In [6] author's give one optimization of the method proposed in [2] where selection of the similarity measure is performed using the principles of redundancy and fault tolerance, in [7] is described one search engine using MySQL as one of cheap option, work [8] presents one architecture which uses different semantic web technologies and builds one prototype of semantic web mashup possibility, paper [9] proposes one novel Italian Sign Language Multi Word Net using process of integration the Multi Word Net lexical database and the Italian Sign Language, paper [10] describes a novel LInSTSS approach which is suitable for using to create a software tool which is capable to determine the semantic similarity of two presented no large texts, in paper [11], authors propose the use of smoothed ngram language models to classify tweets as a typical short texts from Twitter in both Portuguese languages -Brazilian and European variants, paper [12] deals with the software architecture which establishing electronic services for searching and presentation in an information system on scientific activities of the Ministry of Education, Science and Technological Development of the Republic of Serbia and work [13] has objective to give a lexicon based algorithm which is able to perform different natural language identification using minimal training data in the obligatory process of machine learning because this step is often the first step in many natural language processing tasks which is normally necessary to make in the shortest possible time. Therefore, we have a strong motive for designing the SEFRA frameworkhybrid solution based on existing Web services and technologies (framework source code is available at: https://bitbucket.org/mjovanov/pretraga/).…”
Section: Related Workmentioning
confidence: 99%