Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2008
DOI: 10.1145/1390334.1390528
|View full text |Cite
|
Sign up to set email alerts
|

On document splitting in passage detection

Abstract: Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organization. We explore the methodology to detect such hidden passages within a document. A document is divided into passages using various document splitting techniques, and a text classifier is used to categorize such passages. We present a novel document splitting technique called dynamic windowing, which significantly improves precision… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2009
2009
2010
2010

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 3 publications
0
4
0
Order By: Relevance
“…We apply three document splitting approaches, namely KDP, NWP, and OWP as explained in the methodology section. Prior work (Goharian & Mengle, 2008; Mengle & Goharian, 2009b) showed that KDP outperforms the other two methods in detecting passages.…”
Section: Results and Analysismentioning
confidence: 99%
See 2 more Smart Citations
“…We apply three document splitting approaches, namely KDP, NWP, and OWP as explained in the methodology section. Prior work (Goharian & Mengle, 2008; Mengle & Goharian, 2009b) showed that KDP outperforms the other two methods in detecting passages.…”
Section: Results and Analysismentioning
confidence: 99%
“…There is no shared area between two adjacent windows; hence, these windows are called nonoverlapping windows (Hearst, 1994). In the overlapping window passage (OWP; Callan, 1994) approach, a document is divided into n ‐word passages; the overlapping windows are defined from n /2 terms of the prior passage to n /2 terms of the next passage. In the keyword‐based dynamic passage approach (KDP; Goharian & Mengle, 2008), passages are defined around the high weight terms. The probability of detecting the correct category of a passage is higher when the passage contains at least one term with a high term weight. Step 3: Classifying passages and generating confusion matrix.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In the keyword‐based dynamic passage (KDP) approach (Goharian & Mengle, 2008), passages are defined around terms with higher weights. We assume that the probability of detecting the correct category of a passage is higher when the passage contains a term with a higher weight.…”
Section: Methodsmentioning
confidence: 99%