Stop words in review summarization using TextRank

Manalu, Sonya Rapinta

doi:10.1109/ecticon.2017.8096371

Cited by 12 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similarity scores as stores in the similarity matrix. This approach models the similarity matrix into graphs, where nodes of the graph represent the sentences present in the documents and the edges represent the semantic relation through which the sentences are connected (Manalu, 2017). The similarity between the nodes is equivalent to the weighted edges of the graph (Balcerzak et al, 2014).…”

Section: Proposed Methodologymentioning

confidence: 99%

“…TextRank uses certain ways to calculate the relation between sentences, cosine similarity is one of them as described in (Barrios, Lopez, Argerich, & Wachenchauzer, 2016). The meaningless words generally called stop words need to be removed for better summary production as in (Manalu, 2017), (Qaiser & Ali, 2018). TextRank also helps in determining the review assessment and credibility assessment as in (Manalu & Sundjaja, 2017) and (Balcerzak, Jaworski, & Wierzbicki, 2014) respectively.…”

Section: Related Studymentioning

confidence: 99%

“…Broad classification of text summarization can be done in the following two ways: Extract and Abstract summarization (Hidayat, Firdausillah, Hastuti, Dewi, & Azhari, 2015). Extraction based summary choose the relevant words from the sentences and combine them to generate a meaningful summary while in an abstract form of summarization interpretation of the source document is presented in the form of shorter text by using rephrased words (Manalu, 2017).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Comparative Assessment of Extractive Summarization: TextRank, TF-IDF and LDA

Rani¹,

Bidhan²

2021

JSR

View full text Add to dashboard Cite

Automatically generating a shorter version of text documents referred to as text summarization. It is an effective method of finding important details from the documents. There is a massive increment in the data worldwide because of rapid growth rate of the internet. It becomes difficult to manually summarize large documents by human beings. Automatic Text Summarization is an approach of NLP which reduces the time and efforts of the human being to produce a summary. There are various approaches to summarize the data. This paper provides a comparative study over the three approaches namely TF-IDF, TextRank, and Latent Dirichlet Allocation (LDA). The comparison is made by using three different types of datasets like reviews of documents, news articles, legal text, etc. The result shows the best-suited approach for the complexity oriented text inputs. Also, the results are evaluated using ROUGE measures.

show abstract

Section: Proposed Methodologymentioning

confidence: 99%

Section: Related Studymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Comparative Assessment of Extractive Summarization: TextRank, TF-IDF and LDA

Rani¹,

Bidhan²

2021

JSR

View full text Add to dashboard Cite

show abstract

“…Tokenization-splitting the text into smaller units, called tokens, such as words or phrases [34]; 3. Removing stop-words-common words that lack significant meaning [35], such as "the" or "and", were eliminated to reduce the text size and improve the performance of the NLP model; 4. Stemming and lemmatization-leveraging the Sastrawi library, stemming for the Indonesian language (Bahasa) was performed to reduce words to their base form [36], known as the stem or lemma; 5.…”

Section: Pre-processingmentioning

confidence: 99%

Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Saifullah,

Dreżewski,

Dwiyanto

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

Text annotation is an essential element of the natural language processing approaches. The manual annotation process performed by humans has various drawbacks, such as subjectivity, slowness, fatigue, and possibly carelessness. In addition, annotators may annotate ambiguous data. Therefore, we have developed the concept of automated annotation to get the best annotations using several machine-learning approaches. The proposed approach is based on an ensemble algorithm of meta-learners and meta-vectorizer techniques. The approach employs a semi-supervised learning technique for automated annotation to detect hate speech. This involves leveraging various machine learning algorithms, including Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbors (KNN), and Naive Bayes (NB), in conjunction with Word2Vec and TF-IDF text extraction methods. The annotation process is performed using 13,169 Indonesian YouTube comments data. The proposed model used a Stemming approach using data from Sastrawi and new data of 2245 words. Semi-supervised learning uses 5%, 10%, and 20% of labeled data compared to performing labeling based on 80% of the datasets. In semi-supervised learning, the model learns from the labeled data, which provides explicit information, and the unlabeled data, which offers implicit insights. This hybrid approach enables the model to generalize and make informed predictions even when limited labeled data is available (based on self-learning). Ultimately, this enhances its ability to handle real-world scenarios with scarce annotated information. In addition, the proposed method uses a variety of thresholds for matching words labeled with hate speech ranging from 0.6, 0.7, 0.8, to 0.9. The experiments indicated that the DT-TF-IDF model has the best accuracy value of 97.1% with a scenario of 5%:80%:0.9. However, several other methods have accuracy above 90%, such as SVM (TF-IDF and Word2Vec) and KNN (Word2Vec), based on both text extraction methods in several test scenarios.

show abstract

“…Removing stop-words-stop-words are common words that do not provide significant meaning [33], such as "the" or "and", and can be removed from the text to reduce its size and improve the performance of the NLP model.…”

mentioning

confidence: 99%

Automated Text Annotation Using Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Saifullah,

Dreżewski,

Dwiyanto

et al. 2023

Preprint

View full text Add to dashboard Cite

Text annotation is an essential element of the natural language processing approaches. The manual annotation process performed by humans has several drawbacks, such as subjectivity, slowness, fatigue, and possibly carelessness. In addition, annotators may annotate ambiguous data. So, we developed the concept of automated annotation to get the best annotations using several machine-learning approaches. The proposed approach is based on an ensemble algorithm of meta-learners and meta-vectorizer techniques. The approach employs a semi-supervised learning technique for automated annotation, aimed at detecting hate speech. This involves leveraging various machine learning algorithms, including Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbors (KNN), and Naive Bayes (NB), in conjunction with Word2Vec and TF-IDF text extraction methods. The annotation process is performed using 13,169 Indonesian YouTube comments data. The proposed model used a Stemming approach using data from Sastrawi and also new data of 2,245 words. Semi-supervised learning uses 5%, 10%, and 20% of labeled data as compared to performing labeling based on 80% of the datasets. In semi-supervised learning, the model learns from the labeled data, which provides explicit information, and the unlabeled data, which offers implicit insights. This hybrid approach enables the model to generalize and make informed predictions even when limited labeled data is available, ultimately enhancing its ability to handle real-world scenarios with scarce annotated information. In addition, the proposed method uses a variety of thresholds for matching words labeled with hate speech ranging from 0.6, 0.7, 0.8, and 0.9. The experiment showed that the KNN-Word2ec model has the best accuracy value of 96.9% with a scenario of 5%:80%:0.9. However, several other methods have also accuracy above 90%, such as SVM and DT based on both text extraction methods in several test scenarios.

show abstract

Stop words in review summarization using TextRank

Cited by 12 publications

References 7 publications

Comparative Assessment of Extractive Summarization: TextRank, TF-IDF and LDA

Comparative Assessment of Extractive Summarization: TextRank, TF-IDF and LDA

Automated Text Annotation Using a Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Automated Text Annotation Using Semi-Supervised Approach with Meta Vectorizer and Machine Learning Algorithms for Hate Speech Detection

Contact Info

Product

Resources

About