Proceedings of the 28th ACM International Conference on Information and Knowledge Management 2019
DOI: 10.1145/3357384.3357972
|View full text |Cite
|
Sign up to set email alerts
|

Improved Compressed String Dictionaries

Abstract: We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix computation in suffix arrays. Our data structures yield relevant space-time tradeoffs in realworld dictionaries. We focus on two domains where string dictionaries are extensively used and efficient compression is required: URL collections, a key element in Web graphs and application… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(2 citation statements)
references
References 19 publications
0
2
0
Order By: Relevance
“…Our current implementation is designed to handle integer-based triples, so it requires an external dictionary to handle the mapping. As future work, we plan to integrate RDFCSA with some compressed dictionary [18], [19], [40] in order to provide efficient mappings between strings and ids. Another choice is to integrate it in the HDT library (http://rdfhdt.org), which already provides the needed string dictionaries.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Our current implementation is designed to handle integer-based triples, so it requires an external dictionary to handle the mapping. As future work, we plan to integrate RDFCSA with some compressed dictionary [18], [19], [40] in order to provide efficient mappings between strings and ids. Another choice is to integrate it in the HDT library (http://rdfhdt.org), which already provides the needed string dictionaries.…”
Section: Discussionmentioning
confidence: 99%
“…Note that any of the other variants, including RDFCSA, could be complemented with a compact string dictionary that follows the encoding proposed for HDT. Solutions like HashDAC-RP [19] can answer string-to-id and id-to-string translations in a few microseconds per operation (typically 1-4 in URI and literal dictionaries such as those required in DBpedia [19], [40]), and would move solutions based on ids an extra 60% to the right in our plots. Since each triple-pattern requires at most 3 string-to-id translations per query, and at most 3 id-to-string operations per returned result (i.e., at most 2 translations if we omit the (?s, ?p, ?o) triple-pattern), query times would be increased by a few microseconds per result.…”
Section: Comparison With Other Rdf Representationsmentioning
confidence: 99%