Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2001
DOI: 10.1145/383952.384065
|View full text |Cite
|
Sign up to set email alerts
|

Generic topic segmentation of document texts

Abstract: 0DULH)UDQFLQH 0RHQV DQG 5LN 'H %XVVHU .DWKROLHNH 8QLYHUVLWHLW /HXYHQ %HOJLXP ,QWHUGLVFLSOLQDU\ &HQWUH IRU /DZ ,7 7LHQVHVWUDDW % /HXYHQ %HOJLXP [[ WHO [[ PDULHIUDQFHPRHQV#ODZNXOHXYHQDFEH ULNGHEXVVHU#ODZNXOHXYHQDFEH ABSTRACTTopic segmentation is an important initial step in many text-based tasks. A hierarchical representation of a text's topics is useful in retrieval and allows judging relevancy at different levels of detail. This short paper describes research on generic algorithms for topic detection and segme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2002
2002
2015
2015

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 4 publications
0
9
0
Order By: Relevance
“…Linguistic methods still use external semantic information resources such as thesauri and ontologies. Resulting information from the association rules and from external semantic sources may then be combined through statistical techniques [38], which are highly dependent on available resources. Caillet proposes an automatic segmentation method based on term clustering [5].…”
Section: Related Workmentioning
confidence: 99%
“…Linguistic methods still use external semantic information resources such as thesauri and ontologies. Resulting information from the association rules and from external semantic sources may then be combined through statistical techniques [38], which are highly dependent on available resources. Caillet proposes an automatic segmentation method based on term clustering [5].…”
Section: Related Workmentioning
confidence: 99%
“…The other category of techniques is based on Natural Language Processing techniques. Linguistic methods introduce a set of specific rules based on the corpus and use external semantic information such as thesauri and ontologies, possibly combined with one or more statistical methods [23]. This is the main drawback of this type of identification techniques: the results are dependent on the semantic resources available for a specific text [35] and therefore the setup is limited to the text.…”
Section: Related Workmentioning
confidence: 99%
“…Different approaches of topic identification have been reported in literature (Choi, 2000;Clifton, Cooley, & Rennie, 2004;Hearst, 1997;Moens & De Busser, 2001). The typical method for topic identification in single document is text segmentation, which is to segment text based on the similarity of adjacent sentences and detect the boundary of subtopics (Choi, 2000;Hearst, 1997;Moens & De Busser, 2001;Ponte & Croft, 1997).…”
Section: Macro-level Information: Topical Information Of a Document Setmentioning
confidence: 99%
“…The typical method for topic identification in single document is text segmentation, which is to segment text based on the similarity of adjacent sentences and detect the boundary of subtopics (Choi, 2000;Hearst, 1997;Moens & De Busser, 2001;Ponte & Croft, 1997). A popular method for topic identification in multiple documents is text clustering, i.e.…”
Section: Macro-level Information: Topical Information Of a Document Setmentioning
confidence: 99%