2018
DOI: 10.19113/sdufbed.15893
|View full text |Cite
|
Sign up to set email alerts
|

A Comparison of Different Approaches to Document Representation in Turkish Language

Abstract: Recently, deep learning methods have demonstrated state-of-the-art performance in numerous complex Natural Language Processing (NLP) problems. Easy accessibility of high-performance computing resources and open-source libraries makes Artificial Intelligence (AI) approaches more applicable for researchers. This sudden growth of available techniques shaped and improved standards in the field of NLP. Thus, we find an opportunity to compare different approaches to document representation, owing to various open-sou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
2
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 19 publications
0
2
0
1
Order By: Relevance
“…For the CB data set, emoticons were encoded to keep them within the content and used as features [27]. For TTC, on the other hand, it was provided to prevent the underscore character, as this data set includes domain-specific multi-terms joined with the underscore character [43].…”
Section: Preprocessingmentioning
confidence: 99%
See 1 more Smart Citation
“…For the CB data set, emoticons were encoded to keep them within the content and used as features [27]. For TTC, on the other hand, it was provided to prevent the underscore character, as this data set includes domain-specific multi-terms joined with the underscore character [43].…”
Section: Preprocessingmentioning
confidence: 99%
“…This process yielded a balanced data set in which half of the messages were labeled "yes" and the remaining labeled "no". On the other hand, the second non-benchmark data set [43] included Turkish news texts from seven categories; namely, world, economy, culture-art, health, politics, sports, and technology. This data set contained 4900 documents, and each category included 700 documents.…”
Section: Data Setsmentioning
confidence: 99%
“…Después de convertir datos no estructurados en datos estructurados, necesitamos tener un modelo de representación de documentos efectivo para construir un sistema de clasificación eficiente [12]. En el marco de este proyecto, se evalúan y aplican diferentes estrategias de representación de documentos usualmente utilizadas como bag of words [18], topic modeling [34], embeddings [33] y BERT [29]. El objetivo de esta línea de investigación es evaluar las diferentes técnicas aplicadas a la clasificación de correos electrónicos.…”
Section: B Representación De Correosunclassified