Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences

Ledeneva, Yulia

doi:10.1007/978-3-540-88636-5_11

Cited by 8 publications

(7 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Even the combination of AR, WSD and TE could not reach it. It can be concluded that for the TS systems based on the unigrams as opposed to the multiword descriptions [16] stopwords filtering is essential. The best result in this range of settings is 0.40629.…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

The role of statistical and semantic features in single-document extractive summarization

Vodolazova

Lloret²,

Muñoz³

et al. 2013

AIR

View full text Add to dashboard Cite

This paper reports on the further results of the ongoing research analyzing the impact of a range of commonly used statistical and semantic features in the context of extractive text summarization. The features experimented with include word frequency, inverse sentence and term frequencies, stopwords filtering, word senses, resolved anaphora and textual entailment. The obtained results demonstrate the relative importance of each feature and the limitations of the tools available. It has been shown that the inverse sentence frequency combined with the term frequency yields almost the same results as the latter combined with stopwords filtering that in its turn proved to be a highly competitive baseline. To improve the suboptimal results of anaphora resolution, the system was extended with the second anaphora resolution module. The present paper also describes the first attempts of the internal document data representation.

show abstract

Section: Resultsmentioning

confidence: 99%

“…However, not all the extractive TS approached equally benefit from the stopwords filtering. Ledeneva et al [16] have shown that removing the stopwords yields worse results for TS systems based on the multiword descriptions.…”

Section: Stopwords Filteringmentioning

confidence: 99%

The role of statistical and semantic features in single-document extractive summarization

Vodolazova

Lloret²,

Muñoz³

et al. 2013

AIR

View full text Add to dashboard Cite

show abstract

“…Case folding adalah tahapan yang berfungsi untuk mengubah font, serta mengubah semua huruf menjadi huruf lowercase [11]. Stopwords removal adalah tahapan text preprocessing yang akan menghilangkan stopwords dalam suatu teks [12]. Contoh stopword dalam bahasa Indonesia adalah "yang", "dan", "di", dan lain sebagainya.…”

Section: B Text Miningunclassified

Klasifikasi Sentimen Ulasan Film Indonesia dengan Konversi Speech-to-Text (STT) Menggunakan Metode Convolutional Neural Network (CNN)

Shafirra¹,

Irhamah²

2020

JSSITS

View full text Add to dashboard Cite

Ulasan film adalah sebuah opini yang bersifat subjektif. Ulasan film memiliki media yang bera-gam, seperti tulisan, audio, dan video. Ulasan film dapat diolah dengan menggunakan klasifikasi sentimen, agar u-capan seseorang terkait film dapat ditentukan sebagai sen-timen tertentu. Di masa sekarang, data memiliki berbagai bentuk, pemilihan jenis data yang lebih baik juga dapat mempengaruhi klasifikasi sentimen. Data video dapat di-konversi menjadi data teks dengan bantuan Speech-to-Text (STT). Data teks digunakan karena kata atau kalimat dapat dibedakan secara negatif atau positif. Data ulasan dikelom-pokkan berdasarkan aspek penilaian film dan klasifikasi sentimen dilakukan pada keseluruhan potongan ulasan serta di tiap aspek yang ada. Dengan menggunakan metode Convolutional Neural Network, didapatkan bahwa model klasifikasi sentimen tiap aspek memiliki nilai AUC lebih baik dibandingkan model klasifikasi sentimen dengan keseluruhan data.

show abstract

“…En diversas aplicaciones del PLN se han hecho trabajos sobre pre-procesamiento uno de ellos es el de Ledeneva [27], en donde se analiza la importancia del preprocesamiento, en la generación automática de resúmenes utilizando secuencias frecuentes maximales. Las técnicas de pre-procesamiento que utilizaron fueron análisis léxico como eliminación de signos de puntuación, normalización de números y algunas variantes de stopwords y stemming.…”

Section: Estado Del Arteunclassified

Efecto del pre-procesamiento en la detección automática de plagio para PAN 2014 y PAN 2015

García¹,

Ledeneva²,

García-Hernández³

2016

RCS

Self Cite

View full text Add to dashboard Cite

Dentro de la detección automática de plagio, el alineamiento de texto en [1] lo define como el descubrimiento de fragmentos similares de texto entre dos documentos. La cual puede utilizarse en: detección de plagio, identificación de autoría, detección de reúso de texto, recuperación de información, entre muchas otras. El pre-procesamiento consta de diversas técnicas que se aplica en la mayoría de las tareas del Procesamiento del Lenguaje Natural (PLN), en este caso, las heurísticas presentadas son tomadas de los trabajos [1] y [2] de las mejores participaciones en la competencia internacional de detección automática de plagio PAN 2014 y PAN 2015 en la sub-tarea alineamiento de texto monolingüe, con la finalidad de conocer el efecto que tiene la eliminación de stopwords y el uso o no de stemming en las heurísticas antes mencionadas, que son técnicas dentro del pre-procesamiento.

show abstract

Effect of Preprocessing on Extractive Summarization with Maximal Frequent Sequences

Cited by 8 publications

References 12 publications

The role of statistical and semantic features in single-document extractive summarization

The role of statistical and semantic features in single-document extractive summarization

Klasifikasi Sentimen Ulasan Film Indonesia dengan Konversi Speech-to-Text (STT) Menggunakan Metode Convolutional Neural Network (CNN)

Efecto del pre-procesamiento en la detección automática de plagio para PAN 2014 y PAN 2015

Contact Info

Product

Resources

About