Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2014
DOI: 10.3115/v1/d14-1077
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing Stemming Approaches for Turkish Multi-Document Summarization

Abstract: In this study, we analyzed the effects of applying different levels of stemming approaches such as fixed-length word truncation and morphological analysis for multi-document summarization (MDS) on Turkish, which is an agglutinative and morphologically rich language. We constructed a manually annotated MDS data set, and to our best knowledge, reported the first results on Turkish MDS. Our results show that a simple fixed-length word truncation approach performs slightly better than no stemming, whereas applying… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(17 citation statements)
references
References 14 publications
(12 reference statements)
0
17
0
Order By: Relevance
“…Given a query as a natural language statement, EvidenceMiner retrieves textual evidence at the sentence level from the CORD-19 corpus for life sciences. More recently, Raza et al (2022) present an Information Retrieval System that uses latent information to select relevant works related to specific concepts. Otegi et al (2022) develop a Question Answering system that receives a set of questions asked by experts about the disease COVID-19 and SARS-CoV-2 virus, and provides a ranked list of expert-level answers to each question.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Given a query as a natural language statement, EvidenceMiner retrieves textual evidence at the sentence level from the CORD-19 corpus for life sciences. More recently, Raza et al (2022) present an Information Retrieval System that uses latent information to select relevant works related to specific concepts. Otegi et al (2022) develop a Question Answering system that receives a set of questions asked by experts about the disease COVID-19 and SARS-CoV-2 virus, and provides a ranked list of expert-level answers to each question.…”
Section: Related Workmentioning
confidence: 99%
“…The first component, responsible for extracting the latent concepts learned by a model is based on work done by Dalvi et al (2022), called Latent Concept Analysis. At a high level, feature vectors (contextualized representations) are first generated by performing a forward pass on the model.…”
Section: Concept Discoverymentioning
confidence: 99%
See 1 more Smart Citation
“…It is important to observe that this complexity constrains implementation of state-ofthe-art models and algorithms developed for example for English. In order to overcome data-sparsity in Turkish, dense pre-processing tasks such as stemming or lemmatization (possibly followed with a feature selection step) should be introduced before NLP pipelines [44]. Both stemming and lemmatization has goal to reduce inflectional or derivational forms of words into a common base form.…”
Section: Turkish Language Modelling Challenges Based On Its Morphological Complexitymentioning
confidence: 99%
“…In this context, Turkish words take numerous inflectional and derivational suffixes and it is possible to derive a Turkish word that correspond to an English sentence (Oflazer, 2014): yap+abil+ecek+se+niz -> if you will be able to do (it) One of the main problem of Turkish morphology arises while obtaining vector space model for machine learning classifiers. More specifically, Turkish words are in general composed of morphemes that may result in data sparsity that may decrease performance of classifiers (Nuzumlalı & Özgür, 2014). The solution to this problem is relatively handled with stemming and lemmatization whose goals are obtaining base forms of words with reducing inflectional forms.…”
Section: Introductionmentioning
confidence: 99%