Corpus-based web document summarization using statistical and linguistic approach

Shams, Rushdi; Hashem, M. M. A.; Hossain, A. B. M. Awolad; Akter, Suraiya Rumana; Gope, Monika

doi:10.1109/iccce.2010.5556854

Cited by 5 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Suanmali et al proposed feature based sentence and knowledge extraction technique in [16], where representativeness of sentences and knowledge depended on fused features like normalization, term frequencies and number of proper nouns. We did not fuse these features rather we used them independently as [9] showed that independent features perform better on domain-specific information retrieval. Cao et al [2] weighed adjectives to extract concepts from commonsense knowledge.…”

Section: Related Workmentioning

confidence: 99%

“…For example, a list of Term Frequencies [9] is given in Table II. When considering domain-specific knowledge, it is usual that some commonsense will be present in more than one knowledge.…”

Section: A Development Of Commonsense Knowledge-basementioning

confidence: 99%

“…This denotes the statistical distance of a knowledge from others due to the variation of commonsense present in them. Moreover, if is the sentence whose textual commonsense concepts are to be acquired, we calculated the sentence weight , which is the function of number of terms ( ), number of words ( ) and term frequencies ( ) and normalized it to get [9]. Then, we selected the relevant commonsense knowledge using statistical analysis on and and selected the proper nouns in them.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Domain-specific textual commonsense concept acquisition using a corpus

Shams

Chowdhury

Shawon

2011

2011 International Conference on Communications, Computing and Control Applications (CCCA)

Self Cite

View full text Add to dashboard Cite

In this paper, we present a textual commonsense concept acquisition system named SenCept. It works on text of DC electrical circuits and provides commonsense concepts associated with them for better contextualization. SenCept uses a manually developed commonsense knowledge-base that is built upon linguistic information of a domain-specific corpus. We selected representative commonsense knowledge by using several parameters like knowledge weight, average commonsensical distances among knowledge, and normalized mean. To identify commonsense concepts for any sentence, SenCept concentrates on mean of distances between normalized weights of representative sentences and average commonsensical distances among knowledge. We fed 100 sentences to five human subjects and SenCept to evaluate its performance. Results showed that concepts produced by SenCept are originated from textual commonsense in contrast to human analysis that produces concepts from domain knowledge. Moreover, SenCept's Common Concept Rate (CCR) is 43 percent-which is better than that of human analysis.

show abstract

Section: Related Workmentioning

confidence: 99%

“…For example, a list of Term Frequencies [9] is given in Table II. When considering domain-specific knowledge, it is usual that some commonsense will be present in more than one knowledge.…”

Section: A Development Of Commonsense Knowledge-basementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Domain-specific textual commonsense concept acquisition using a corpus

Shams

Chowdhury

Shawon

2011

2011 International Conference on Communications, Computing and Control Applications (CCCA)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Since then, a large number of techniques and approaches have been developed. Interestingly, the large volumes of information created on the web have triggered much of this development (Nenkova & McKeown, 2011;Shams, Hashem, Hossain, Akter, & Gope, 2010). Bhargava, Sharma, and Sharma (2016) posit that text summarization tools have now become a necessity to navigate the information on the web because they help eliminate dispensable or superfluous content.…”

Section: Overview Of Automated Text Summarization and Evaluationmentioning

confidence: 99%

“…Text units with a concentration of high-score words are often likely contenders for extraction (Liu & Liu, 2009). Extraction-based summarization, then, is essentially concerned with evaluating the salience or the indicative power of each sentence in a given document (Shams et al, 2010). Figure 1 maps out the process flow for extraction-based systems.…”

Section: Extraction-based Text Summarizationmentioning

confidence: 99%

Artificial Intelligence in Business Communication: A Snapshot

Naidoo

Dulek

2018

International Journal of Business Communication

View full text Add to dashboard Cite

Despite artificial intelligence's far-reaching influence in the financial reporting and other business domains, there is a surprising dearth of accessible descriptions about the assumptions underlying the software's development along with an absence of empirical evidence assessing the viability and usefulness of this communication tool. With these observations in mind, the purposes of this study are to explain how automated text summarization applications work from an overarching, semitechnical, modestly theoretical perspective and, using ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation-1) evaluation metrics, assess how effective the summarization software is when summarizing complex business reports. The results of this study show that the extraction-based summarization system produced moderately satisfactory results in terms of extracting relevant instances of the text from the business reports. Much work still needs to be accomplished in the area of precision and recall in extraction-based systems before the software can match a human's ability to capture the gist of a body of text.

show abstract

An efficient framework of utilizing the latent semantic analysis in text extraction

Ababneh

2019

Int J Speech Technol

View full text Add to dashboard Cite

The use of the Latent Semantic Analysis (LSA) in text mining demands large space and time requirements. This paper proposes a new text extraction method that sets a framework on how to employ the statistical semantic analysis in the text extraction in an efficient way. The method uses the centrality feature and omits the segments of the text that have a high verbatim, statistical, or semantic similarity with previously processed segments. The identification of similarity is based on a new multi-layer similarity method that computes the similarity in three statistical layers, it uses the Jaccard similarity and the Vector Space Model (VSM) in the first and second layers respectively, and uses the LSA in the third layer. The multi-layer similarity restricts the use of the third layer for the segments that the first and second layers failed to estimate their similarities. Rouge tool is used in the evaluation, but because Rouge does not consider the extract's size, we supplemented it with a new evaluation strategy based on the compression rate and the ratio of the sentences intersections between the automatic and the reference extracts. Our comparisons with classical LSA and traditional statistical extractions showed that we reduced the use of the LSA procedure by 52%, and we obtained 65% reduction on the original matrix dimensions, also, we obtained remarkable accuracy results. It is concluded that the employment of the centrality feature with the proposed multi-layer framework yields a significant solution in terms of efficiency and accuracy in the field of text extraction.

show abstract

Corpus-based web document summarization using statistical and linguistic approach

Cited by 5 publications

References 10 publications

Domain-specific textual commonsense concept acquisition using a corpus

Domain-specific textual commonsense concept acquisition using a corpus

Artificial Intelligence in Business Communication: A Snapshot

An efficient framework of utilizing the latent semantic analysis in text extraction

Contact Info

Product

Resources

About