Unsupervised Automatic Text Summarization of Konkani Texts using K-means with Elbow Method

D’Silva, Jovi; Sharma, Uzzal

doi:10.37624/ijert/13.9.2020.2380-2384

Cited by 16 publications

(10 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Tables 1, 2 and 3, outline the metrics of the assessment of the corresponding human summaries with the system-generated summaries using deep learning. These tables also provide the scores of automatic text summarization system built with k-means clustering with 3 clusters with the output summaries generated with the same Konkani folk tales dataset provided as input to the system [30]. The performance of ATS systems can be compared against baseline systems, like using leading sentences from the input text document [2].…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning

D’Silva

Sharma

2022

IJECE

Self Cite

View full text Add to dashboard Cite

<span lang="EN-US">Automatic text summarization has gained immense popularity in research. Previously, several methods have been explored for obtaining effective text summarization outcomes. However, most of the work pertains to the most popular languages spoken in the world. Through this paper, we explore the area of extractive automatic text summarization using deep learning approach and apply it to Konkani language, which is a low-resource language as there are limited resources, such as data, tools, speakers and/or experts in Konkani. In the proposed technique, Facebook’s fastText <br /> pre-trained word embeddings are used to get a vector representation for sentences. Thereafter, deep multi-layer perceptron technique is employed, as a supervised binary classification task for auto-generating summaries using the feature vectors. Using pre-trained fastText word embeddings eliminated the requirement of a large training set and reduced training time. The system generated summaries were evaluated against the ‘gold-standard’ human generated summaries with recall-oriented understudy for gisting evaluation (ROUGE) toolkit. The results thus obtained showed that performance of the proposed system matched closely to the performance of the human annotators in generating summaries.</span>

show abstract

Section: Resultsmentioning

confidence: 99%

“…In our experiment, we compute sentence embeddings, using fastText word embeddings, which act as feature vectors ideal to be used with MLPs [29]. Previous work using machine learning for text summarization in Konkani used k-means clustering on the same Konkani dataset [30]. We compare this system with the system presented in this paper.…”

Section: Related Workmentioning

confidence: 99%

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning

D’Silva

Sharma

2022

IJECE

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [12], we discussed the content recommendation system approaches based on grouping for similar articles that used TF-IDF to perform vector transformation of the document contents and, through cosine similarity, applied k-means [13] for clustering them. In [14], the authors automatically summarized texts using TF-IDF and k-means to determine the document's textual groups used to create the abstract. Then, TF-IDF is considered the primary technique for vectorizing textual content and k-means the most used algorithm for unsupervised machine learning.…”

Section: State-of-the-art Reviewmentioning

confidence: 99%

Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

Oliveira¹,

Nascimento²

2022

Data Clustering

View full text Add to dashboard Cite

The Brazilian legal system postulates the expeditious resolution of judicial proceedings. However, legal courts are working under budgetary constraints and with reduced staff. As a way to face these restrictions, artificial intelligence (AI) has been tackling many complex problems in natural language processing (NLP). This work aims to detect the degree of similarity between judicial documents that can be achieved in the inference group using unsupervised learning, by applying three NLP techniques, namely term frequency-inverse document frequency (TF-IDF), Word2Vec CBoW, and Word2Vec Skip-gram, the last two being specialized with a Brazilian language corpus. We developed a template for grouping lawsuits, which is calculated based on the cosine distance between the elements of the group to its centroid. The Ordinary Appeal was chosen as a reference file since it triggers legal proceedings to follow to the higher court and because of the existence of a relevant contingent of lawsuits awaiting judgment. After the data-processing steps, documents had their content transformed into a vector representation, using the three NLP techniques. We notice that specialized word-embedding models—like Word2Vec—present better performance, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.

show abstract

“…This method can be illustrated through a line plot between SSE (Sum of Squared error) compared to the total cluster and finding a point that represents "an elbow point" (the point after SSE or inertia starts decreasing in a linear fashion). Elbow method is often used in previous studies for determining the optimal number of clusters [14], [15] , in addition to the silhouette coefficient method [16].…”

Section: )mentioning

confidence: 99%

Machine Learning Mini Batch K-means and Business Intelligence Utilization for Credit Card Customer Segmentation

Rachman¹,

Santoso²,

Djajadi³

2021

IJACSA

View full text Add to dashboard Cite

Unsupervised Automatic Text Summarization of Konkani Texts using K-means with Elbow Method

Cited by 16 publications

References 14 publications

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning

Clustering by Similarity of Brazilian Legal Documents Using Natural Language Processing Approaches

Machine Learning Mini Batch K-means and Business Intelligence Utilization for Credit Card Customer Segmentation

Contact Info

Product

Resources

About