Entailment and Spectral Clustering based Single and Multiple Document Summarization

Gupta, Anand; Kaur, Manpreet; Bajaj, Ahsaas; Khanna, Ansh

doi:10.5815/ijisa.2019.04.04

Cited by 4 publications

(4 citation statements)

References 20 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The vast majority of extractive, non-neural summarization algorithms use four data-sets for performance evaluation, exhibiting an interesting pattern. Unsupervised summarization methods majorly evaluate performance on DUC datasets (Fang et al (2017) (Huang, 2017;Li et al, 2019), graph-based methods for sentence selection (Zheng & Lapata, 2019;Gupta et al, 2019), multiobjective optimization for sentence selection (Saini et al, 2019;Mishra et al, 2021), BART for abstraction (Chaturvedi et al, 2020;Dou et al, 2021), using embedding based similarity for reducing redundancy (Hailu et al, 2020;Zhong et al, 2020b) etc.…”

Section: State-of-the-art In Document Summarizationmentioning

confidence: 99%

Investigating Entropy for Extractive Document Summarization

Khurana,

Bhatnagar

2021

Preprint

View full text Add to dashboard Cite

Automatic text summarization aims to cut down readers' time and cognitive effort by reducing the content of a text document without compromising on its essence. Ergo, informativeness is the prime attribute of document summary generated by an algorithm, and selecting sentences that capture the essence of a document is the primary goal of extractive document summarization.In this paper, we employ Shannon's entropy to capture informativeness of sentences. We employ Non-negative Matrix Factorization (NMF) to reveal probability distributions for computing entropy of terms, topics, and sentences in latent space. We present an information theoretic interpretation of the computed entropy, which is the bedrock of the proposed E-Summ algorithm, an unsupervised method for extractive document summarization. The algorithm systematically applies information theoretic principle for selecting informative sentences from important topics in the document. The proposed algorithm is generic and fast, and hence amenable to use for summarization of documents in real time. Furthermore, it is domain-, collection-independent and agnostic to the language of the document. Benefiting from strictly positive NMF factor matrices, E-Summ algorithm is transparent and explainable too.We use standard ROUGE toolkit for performance evaluation of the proposed method on four well known public data-sets. We also perform quantitative as- *

show abstract

Section: State-of-the-art In Document Summarizationmentioning

confidence: 99%

Investigating Entropy for Extractive Document Summarization

Khurana,

Bhatnagar

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Among non-neural models, a popular approach is to capture relations between sentences or word phrases via a weighted graph. Gupta et al (2014Gupta et al ( , 2019 model the sentences of the document as nodes of a weighted directed graph and compute idf based entailment scores between sentence pairs. They use weighted minimum vertex cover to extract most salient sentences.…”

Section: Background and Related Workmentioning

confidence: 99%

“…(ii) Unsupervised: Entailment based Weighted Minimum Vertex Cover (wMVC) is an unsupervised network based approach proposed by Gupta et al (2014). The sentences are modelled as vertices of the graph, and Inverse Document Frequency (IDF) based entailment is employed to link sentences Gupta et al (2019). The algorithm considers those sentences important, which entail many sentences.…”

Section: Extractive Summarizationmentioning

confidence: 99%

Divide and Conquer: From Complexity to Simplicity for Lay Summarization

Chaturvedi¹,

Saachi²,

Dhani³

et al. 2020

Proceedings of the First Workshop on Scholarly Document Processing

View full text Add to dashboard Cite

We describe our approach for the 1st Computational Linguistics Lay Summary Shared Task CL-LaySumm20. The task is to produce nontechnical summaries of scholarly documents. The summary should be within easy grasp of a layman who may not be well versed with the domain of the research article. We propose a two step divide-and-conquer approach. First, we judiciously select segments of the documents that are not overly pedantic and are likely to be of interest to the laity, and over-extract sentences from each segment using an unsupervised network based method. Next, we perform abstractive summarization on these extractions and systematically merge the abstractions. We run ablation studies to establish that each step in our pipeline is critical for improvement in the quality of lay summary. Our approach leverages state-of-the-art pre-trained deep neural network based models as zero-shot learners to achieve high scores on the task.

show abstract

“…Clinical notes 22 and non-structural interviews [23][24][25] are also used and often associated with a more precise diagnosis using self-questionnaire PCL-5 based on the DSM-5 criterion 26 or semi-structured interview SCID (American Psychiatric Association 2013). To build NLP models, many kinds of linguistic features are extracted: statistical (number of words, number of words per sentence), morpho-syntactic (proportion of rst-person pronoun, verb tense), topic modeling (LDA, LSA); word vector representation (Word2Vec, Doc2Vec, Glove, Fasttext), contextual embeddings vectors (BERT, Roberta), graph-based features 28 , coherence 29 and readability features 30 , external resources such as LIWC 31 , sentiment analysis scores like LabMT 32 , TexBlob (Loria, 2018) or FLAIR 34 and transfer learning methods like DLATK 35 that used pre-trained models on social media data. The models used for the classi cation task, which consists of separating in people with and without PTSD, are mainly Random Forest (RF), 36 , Logistic Regression (LR), CNN, LSTM, and transformers 16,17 .…”

Section: Introductionmentioning

confidence: 99%

Towards Unlocking the Linguistic Code of Post-Traumatic Stress Disorder: A Comprehensive Analysis and Diagnostic Approach

Quillivic,

Gayraud,

Auxéméry

et al. 2023

Preprint

View full text Add to dashboard Cite

Post-traumatic stress disorder (PTSD) lacks clear biomarkers in clinical practice. Language as a potential diagnostic biomarker for PTSD is investigated in this study. We analyze an original cohort of 148 individuals exposed to the November 13, 2015, terrorist attacks in Paris. The interviews, conducted 5 to 11 months after the event, include individuals from similar socioeconomic backgrounds exposed to the same incident, responding to identical questions and using uniform PTSD measures. Using this dataset to collect nuanced insights that might be clinically relevant we proposes a three-step interdisciplinary methodology that integrates expertise from psychiatry, linguistics, and the Natural Language Processing (NLP) community to examine the relationship between language and PTSD. The first step assesses a clinical psychiatrist's ability to diagnose PTSD using interview transcription alone. The second step uses statistical analysis and machine learning models to create language features based on psycholinguistic hypotheses and evaluate their predictive strength. The third step is the application of a hypothesis-free deep learning approach to the classification of PTSD in our cohort. Results show that the clinical psychiatrist achieved a diagnosis of PTSD with an AUC of 0.72. This is comparable to a gold standard questionnaire (AUC ≈ 0.80). The machine learning model achieved a diagnostic AUC of 0.69. The deep learning approach achieved an AUC of 0.64. An examination of model error informs our discussion. Importantly, the study controls for confounding factors, establishes associations between language and DSM-5 subsymptoms, and integrates automated methods with qualitative analysis. This study provides a direct and methodologically robust description of the relationship between PTSD and language. Our work lays the groundwork for advancing early and accurate diagnosis and using linguistic markers to assess the effectiveness of pharmacological treatments and psychotherapies.

show abstract

Entailment and Spectral Clustering based Single and Multiple Document Summarization

Cited by 4 publications

References 20 publications

Investigating Entropy for Extractive Document Summarization

Investigating Entropy for Extractive Document Summarization

Divide and Conquer: From Complexity to Simplicity for Lay Summarization

Towards Unlocking the Linguistic Code of Post-Traumatic Stress Disorder: A Comprehensive Analysis and Diagnostic Approach

Contact Info

Product

Resources

About