Text Summarization using TF-IDF and Textrank algorithm

Zaware, Sarika; Patadiya, Deep; Gaikwad, A. D.; Gulhane, Sanket; Thakare, Akash

doi:10.1109/icoei51242.2021.9453071

Cited by 10 publications

(6 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TextRank is an unsupervised method that has been used in many previous work on summarization, either as main or baseline method, because of its proven effectiveness [8]- [11]. Previously, many studies have also been conducted to enhance the performance of the TextRank algorithm for text summarization or keyword extraction, such as using word embedding [12]- [18], term frequency-inverse document frequency (TF-IDF) [19], [20], the combination of 1 gram, 2 gram, and Hidden Markov models [21], knowledge graph sentence embedding and K-means clustering [22], statistical and linguistic features for sentence weighting [23], variation of sentence similarity functions [24], and fine-tuning the hyperparameters [13].…”

Section: Introductionmentioning

confidence: 99%

“…Then, TF-IDF is used to weigh the generated word embedding to further improve sentence representation; this is based on our intuition that more important words, which are estimated using corpus statistics, should be valued more when generating the vector representation of sentences. TF-IDF is chosen because it has been shown to perform well for term weighting in various natural language processing tasks [19], [20], [25]- [27], and it also has been proven to significantly outperform the bag-of-words (BoW) technique [28]. The combination of word embedding and TF-IDF weighting components are then expected to improve the estimation of sentence relationships in the TextRank algorithm.…”

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

“…Using TextRank and TF-IDF to generate extractive summaries from documents has been explored by relatively a few studies [19], [20]. Zaware et al [19] used TF-IDF as a vector representation of sentences to be inputted into the TextRank algorithm. More specifically, a sentence is represented using an ndimensional vector space containing the TF-IDF scores of words.…”

Section: Introductionmentioning

confidence: 99%

“…The resulting keywords were then used to score the sentences. In contrast to Zaware et al [19] and Guan et al [20] we use TF-IDF to weigh the word embedding to improve the vector representation of sentences.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Enhanced TextRank using weighted word embedding for text summarization

Yulianti,

Pangestu,

Jiwanggi

2023

IJECE

View full text Add to dashboard Cite

<p><span lang="EN-US">The length of a news article may influence people’s interest to read the article. In this case, text summarization can help to create a shorter representative version of an article to reduce people’s read time. This paper proposes to use weighted word embedding based on Word2Vec, FastText, and bidirectional encoder representations from transformers (BERT) models to enhance the TextRank summarization algorithm. The use of weighted word embedding is aimed to create better sentence representation, in order to produce more accurate summaries. The results show that using (unweighted) word embedding significantly improves the performance of the TextRank algorithm, with the best performance gained by the summarization system using BERT word embedding. When each word embedding is weighed using term frequency-inverse document frequency (TF-IDF), the performance for all systems using unweighted word embedding further significantly improve, with the biggest improvement achieved by the systems using Word2Vec (with 6.80% to 12.92% increase) and FastText (with 7.04% to 12.78% increase). Overall, our systems using weighted word embedding can outperform the TextRank method by up to 17.33% in ROUGE-1 and 30.01% in ROUGE-2. This demonstrates the effectiveness of weighted word embedding in the TextRank algorithm for text summarization.</span></p>

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Enhanced TextRank using weighted word embedding for text summarization

Yulianti,

Pangestu,

Jiwanggi

2023

IJECE

View full text Add to dashboard Cite

show abstract

Research on personalized learning path planning model based on knowledge network

Gong

Zhong

et al. 2022

Neural Comput & Applic

View full text Add to dashboard Cite

Identifying learners’ topical interests from social media content to enrich their course preferences in MOOCs using topic modeling and NLP techniques

et al. 2022

View full text Add to dashboard Cite

Interests play an essential role in the process of learning, thereby enriching learners ‘interests will yield to an enhanced experience in MOOCs. Learners interact freely and spontaneously on social media through different forms of user-generated content which contain hidden information that reveals their real interests and preferences. In this paper, we aim to identify and extract the topical interest from the text content shared by learners on social media to enrich their course preferences in MOOCs. We apply NLP pipeline and topic modeling techniques to the textual feature using three well-known topic models: Latent Dirichlet Allocation, Latent Semantic Analysis, and BERTopic. The results of our experimentation have shown that BERTopic performed better on the scrapped dataset.

show abstract

Text Summarization using TF-IDF and Textrank algorithm

Cited by 10 publications

References 18 publications

Enhanced TextRank using weighted word embedding for text summarization

Enhanced TextRank using weighted word embedding for text summarization

Research on personalized learning path planning model based on knowledge network

Identifying learners’ topical interests from social media content to enrich their course preferences in MOOCs using topic modeling and NLP techniques

Contact Info

Product

Resources

About