Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2015
DOI: 10.18653/v1/w15-4637
|View full text |Cite
|
Sign up to set email alerts
|

ExB Text Summarizer

Abstract: We present our state of the art multilingual text summarizer capable of single as well as multi-document text summarization. The algorithm is based on repeated application of TextRank on a sentence similarity graph, a bag of words model for sentence similarity and a number of linguistic pre-and post-processing steps using standard NLP tools. We submitted this algorithm for two different tasks of the MultiLing 2015 summarization challenge: Multilingual Singledocument Summarization and Multilingual Multi-documen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 13 publications
0
3
0
Order By: Relevance
“…The rich and intricate morphological and syntactic flexibility of Arabic is widely known [32]. The preprocessing stage is essentially the same for all languages and often entails normalization, tokenization, POS tagging, stemming/lemmatization, and stop-word removal [33][34][35]. Since most texts produced in Arabic and saved in electronic form do not have diacritical marks at first, the system deals with Arabic texts without them.…”
Section: Data Pre-processingmentioning
confidence: 99%
“…The rich and intricate morphological and syntactic flexibility of Arabic is widely known [32]. The preprocessing stage is essentially the same for all languages and often entails normalization, tokenization, POS tagging, stemming/lemmatization, and stop-word removal [33][34][35]. Since most texts produced in Arabic and saved in electronic form do not have diacritical marks at first, the system deals with Arabic texts without them.…”
Section: Data Pre-processingmentioning
confidence: 99%
“…The following Architecture for QA corpus is shown in Figure 1. In paper [18], [19] & [20] discuss on query is pre-processed using tokenization, stop words removal and stemming to extract keywords. Question type considered are what, when, why, which are trained using question classifier.…”
Section: Proposed System Architecturementioning
confidence: 99%
“…Then the score of each sentence is assigned in respect to its distance from the clusters' representatives. For example, Thomas et al(2015) used a graph-based procedure where each node of the graph represents a sentence and the edges' weights reflect the similarity between the connected nodes. Next, a PageRank/TextRank algorithm is applied 2015) Principal Component Analysis (PCA) was used to project the sentences into a lower-dimension space.…”
Section: Sentence-based Summarizationmentioning
confidence: 99%