2007
DOI: 10.1016/j.ipm.2006.07.001
|View full text |Cite
|
Sign up to set email alerts
|

Using structural contexts to compress semistructured text collections

Abstract: We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the structure of the text. The idea is to use a separate model to compress the text that lies inside each different structure type (e.g., different XML tag). The intuition behind SCM is that the distribution of all the texts that belong to a given structure type should be similar, and different from that of other structure types.We mainly foc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2007
2007
2009
2009

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 16 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…Compression of XML has been studied from a variety of perspectives. Some researchers aim to achieve minimal size [3], [4], [5], others focus on efficient streaming [6], [7], [8] -a balance between bandwidth and encode/decode times -and still others answer XML queries directly from compressed representations [9]. Representations that support queries are necessarily larger than those that do not; Ferragina et al [10] report increases of 25 to 96% compared to opaque representations.…”
Section: Motivationmentioning
confidence: 99%
See 2 more Smart Citations
“…Compression of XML has been studied from a variety of perspectives. Some researchers aim to achieve minimal size [3], [4], [5], others focus on efficient streaming [6], [7], [8] -a balance between bandwidth and encode/decode times -and still others answer XML queries directly from compressed representations [9]. Representations that support queries are necessarily larger than those that do not; Ferragina et al [10] report increases of 25 to 96% compared to opaque representations.…”
Section: Motivationmentioning
confidence: 99%
“…They provide benefits similar to those of XMill's containers. Adiego et al [5] augment this technique with a heuristic to combine certain context models; this yields better results on some large data sets. In our experiments, the performance of xmlppm was nearly always competitive.…”
Section: A General Xml Compressionmentioning
confidence: 99%
See 1 more Smart Citation