In this study we propose an automatic single
document text summarization technique using Latent Semantic
Analysis (LSA) and diversity constraint in combination. The
proposed technique uses the query based sentence ranking. Here
we are not considering the concept of IR (Information Retrieval)
so we generate the query by using the TF-IDF(Term
Frequency-Inverse Document Frequency). For producing the
query vector, we identify the terms having the high IDF. We know
that LSA utilizes the vectorial semantics to analyze the
relationships between documents in a corpus or between sentences
within a document and key terms they carry by producing a list of
ideas interconnected to the documents and terms. LSA helps to
represent the latent structure of documents. For selecting the
sentences from the document Latent Semantic Indexing (LSI) is
used. LSI helps to arrange the sentences with its score.
Traditionally the highest score sentences have been chosen for
summary but here we calculate the diversity between chosen
sentences and produce the final summary as a good summary
should have maximum level of diversity. The proposed technique
is evaluated on OpinosisDataset1.0.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.