Improved Compressed String Dictionaries

Brisaboa, Nieves R.; Cerdeira-Pena, Ana; Bernardo, Guillermo de; Navarro, Gonzalo

doi:10.1145/3357384.3357972

Cited by 6 publications

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our current implementation is designed to handle integer-based triples, so it requires an external dictionary to handle the mapping. As future work, we plan to integrate RDFCSA with some compressed dictionary [18], [19], [40] in order to provide efficient mappings between strings and ids. Another choice is to integrate it in the HDT library (http://rdfhdt.org), which already provides the needed string dictionaries.…”

Section: Discussionmentioning

confidence: 99%

“…Note that any of the other variants, including RDFCSA, could be complemented with a compact string dictionary that follows the encoding proposed for HDT. Solutions like HashDAC-RP [19] can answer string-to-id and id-to-string translations in a few microseconds per operation (typically 1-4 in URI and literal dictionaries such as those required in DBpedia [19], [40]), and would move solutions based on ids an extra 60% to the right in our plots. Since each triple-pattern requires at most 3 string-to-id translations per query, and at most 3 id-to-string operations per returned result (i.e., at most 2 translations if we omit the (?s, ?p, ?o) triple-pattern), query times would be increased by a few microseconds per result.…”

Section: Comparison With Other Rdf Representationsmentioning

confidence: 99%

See 1 more Smart Citation

Space/time-efficient RDF stores based on circular suffix sorting

Brisaboa¹,

Cerdeira-Pena²,

Bernardo³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

In recent years, RDF has gained popularity as a format for the standardized publication and exchange of information in the Web of Data. In this paper we introduce RDFCSA, a data structure that is able to self-index an RDF dataset in small space and supports efficient querying. RDFCSA regards the triples of the RDF store as short circular strings and applies suffix sorting on those strings, so that triple-pattern queries reduce to prefix searching on the string set. The RDF store is then represented compactly using a Compressed Suffix Array (CSA), a proved technology in text indexing that efficiently supports prefix searches. Our experimental evaluation shows that RDFCSA is able to answer triple-pattern queries in a few microseconds per result while using less than 60% of the space required by the raw original data. We also support join queries, which provide the basis for full SPARQL query support. Even though smaller-space solutions exist, as well as faster ones, RDFCSA is shown to provide an excellent space/time tradeoff, with fast and consistent query times within much less space than alternatives that compete in time.

show abstract

Section: Discussionmentioning

confidence: 99%