2007
DOI: 10.1145/1324185.1324190
|View full text |Cite
|
Sign up to set email alerts
|

Overview and semantic issues of text mining

Abstract: Text mining refers to the discovery of previously unknown knowledge that can be found in text collections. In recent years, the text mining field has received great attention due to the abundance of textual data. A researcher in this area is requested to cope with issues originating from the natural language particularities. This survey discusses such semantic issues along with the approaches and methodologies proposed in the existing literature. It covers syntactic matters, tokenization concerns and it focuse… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
77
0
6

Year Published

2011
2011
2020
2020

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 123 publications
(83 citation statements)
references
References 45 publications
0
77
0
6
Order By: Relevance
“…Concepts (sometimes also referred as compound terms or phrases) are important features used in Text Mining [23]. Compound terms processing is a technique aiming at improving accuracy of search engines by indexing documents according to compound terms, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…Concepts (sometimes also referred as compound terms or phrases) are important features used in Text Mining [23]. Compound terms processing is a technique aiming at improving accuracy of search engines by indexing documents according to compound terms, i.e.…”
Section: Related Workmentioning
confidence: 99%
“…텍스트 마이닝은 이러한 다량의 텍스트에 대한 분석을 통해 의미 있는 정보를 추출하는 과 정으로 정의될 수 있다 (Hearst, 1999;Sebastiani, 2002 (Mooney and Bunescu, 2006;Stanvrianou et al, 2007).…”
Section: 텍스트 마이닝unclassified
“…According to Stavrianou et al (2007), additional issues that affect text similarity performance and must be considered during text preprocessing are: stopwords and noisy data (e.g. misspelled words) removal, stemming, part of speech (POS) tagging, multi-word terms (collocations), tokenization and text representation.…”
Section: Introductionmentioning
confidence: 99%