Along with the extreme expansion of big data and the vast development of the internet, making documentation of the huge internet information is the first interest for people. These online textual data led to information overload and redundancy. Multi-document summarization is one of the solutions to such an issue, used to extract the main ideas of the documents and put them into a short summary. Summarizing documents should not affect the major concepts and the meaning of the original text. This paper proposes a new method for multi-document summarization. The basic idea of the proposed method relied on six different features to be extracted of each sentence in the studied collection, these features must be language. A set of the feature vectors is introduced to Convolutional Neural Networks (CNNs) for classification as either summary or non-summary sentences. A graph of summary sentences was generated and assigned scores by the TextRank algorithm. The implemented system was evaluated on both English and Arabic versions of the dataset of the TAC-2011 MultiLing Pilot by using ROUGE metrics. The proposed method achieved an average F-measure 0.46079, 0.20664 using ROUGE-1 and ROUGE-2 respectively, for English documents, and achieved an average F-measure 0.45624, 0.30725 for Arabic documents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.