Mihaela A. Bornea scite author profile

Efficient storage and querying of RDF data is of increasing importance, due to the increased popularity and widespread acceptance of RDF on the web and in the enterprise. In this paper, we describe a novel storage and query mechanism for RDF which works on top of existing relational representations. Reliance on relational representations of RDF means that one can take advantage of 35+ years of research on efficient storage and querying, industrial-strength transaction support, locking, security, etc. However, there are significant challenges in storing RDF in relational, which include data sparsity and schema variability. We describe novel mechanisms to shred RDF into relational, and novel query translation techniques to maximize the advantages of this shredded representation. We show that these mechanisms result in consistently good performance across multiple RDF benchmarks, even when compared with current state-of-the-art stores. This work provides the basis for RDF support in DB2 v.10.1.

show abstract

One-copy serializability with snapshot isolation under the hood

Bornea

Hodson

Elnikety

et al. 2011

View full text Add to dashboard Cite

This paper presents a method that allows a replicated database system to provide a global isolation level stronger than the isolation level provided on each individual database replica. We propose a new multi-version concurrency control algorithm called, serializable generalized snapshot isolation (SGSI), that targets middleware replicated database systems. Each replica runs snapshot isolation locally and the replication middleware guarantees global one-copy serializability. We introduce novel techniques to provide a stronger global isolation level, namely readset extraction and enhanced certification that prevents readwrite and write-write conflicts in a replicated setting. We prove the correctness of the proposed algorithm, and build a prototype replicated database system to evaluate SGSI performance experimentally. Extensive experiments with an 8 replica database system under the TPC-W workload mixes demonstrate the practicality and low overhead of the algorithm. I. INTRODUCTIONIn many server systems replication is used to achieve higher performance and availability than a centralized server. Replication in database systems is, however, particularly challenging because the transactional semantics have to be maintained. The effects of an update transaction at one replica have to be efficiently propagated and synchronized at all other replicas, while maintaining consistency for all update and read-only transactions. This challenge has long been recognized [18], leading to several replication protocols that explicitly tradeoff consistency to achieve higher performance: A replicated database system may provide a lower isolation level than a centralized database system.We show, contrary to common belief, that a replicated database system can efficiently provide a global isolation level stronger than the isolation level provided by the constituent replicas. We focus here on snapshot isolated database systems and introduce a concurrency control algorithm that guarantees global one-copy serializability (1SR), while each replica guarantees snapshot isolation (SI), which is weaker than serializability. We support this claim by proposing an algorithm, proving its correctness, implementing it, and building a prototype replicated database to evaluate it experimentally.Database engines such as PostgreSQL and Oracle support SI as it provides attractive performance for an important class of transactional workloads that have certain properties, e.g., dominance of read-only transactions, short updates and absence of write hot-spots. To take advantage of this performance gain, some database engines support SI in addition to traditional locking schemes. For example, Microsoft SQL Server supports both SI and 2PL. The performance gain of using SI comes at a correctness cost: SI is not serializable.

show abstract

Semi-Streamed Index Join for near-real time execution of ETL transformations

Bornea

Deligiannakis

Kotidis

et al. 2011

View full text Add to dashboard Cite

Combining Unsupervised Pre-training and Annotator Rationales to Improve Low-shot Text Classification

Melamud

Bornea

Barker

2019

View full text Add to dashboard Cite

Supervised learning models often perform poorly at low-shot tasks, i.e. tasks for which little labeled data is available for training. One prominent approach for improving low-shot learning is to use unsupervised pre-trained neural models. Another approach is to obtain richer supervision by collecting annotator rationales (explanations supporting label annotations). In this work, we combine these two approaches to improve lowshot text classification with two novel methods: a simple bag-of-words embedding approach; and a more complex context-aware method, based on the BERT model. In experiments with two English text classification datasets, we demonstrate substantial performance gains from combining pre-training with rationales. Furthermore, our investigation of a range of train-set sizes reveals that the simple bag-of-words approach is the clear top performer when there are only a few dozen training instances or less, while more complex models, such as BERT or CNN, require more training data to shine.

show abstract

Double Index NEsted-Loop Reactive Join for Result Rate Optimization

Bornea

Vassalos

Kotidis

et al. 2009

View full text Add to dashboard Cite

Adaptive Join Operators for Result Rate Optimization on Streaming Inputs

Bornea

Vassalos

Kotidis

et al. 2010

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data are provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join techniques is that they can start producing join results as soon as the first input tuples are available, thus, improving pipelining by smoothing join result production and by masking source or network delays. In this paper, we first propose Double Index NEsted-loops Reactive join (DINER), a new adaptive two-way join algorithm for result rate maximization. DINER combines two key elements: an intuitive flushing policy that aims to increase the productivity of in-memory tuples in producing results during the online phase of the join, and a novel reentrant join technique that allows the algorithm to rapidly switch between processing in-memory and disk-resident tuples, thus, better exploiting temporary delays when new data are not available. We then extend the applicability of the proposed technique for a more challenging setup: handling more than two inputs. Multiple Index NEsted-loop Reactive join (MINER) is a multiway join operator that inherits its principles from DINER. Our experiments using real and synthetic data sets demonstrate that DINER outperforms previous adaptive join algorithms in producing result tuples at a significantly higher rate, while making better use of the available memory. Our experiments also shows that in the presence of multiple inputs, MINER manages to produce a high percentage of early results, outperforming existing techniques for adaptive multiway join.

show abstract

Generative Relation Linking for Question Answering over Knowledge Bases

Rossiello

Mihindukulasooriya

Abdelaziz

et al. 2021

View full text Add to dashboard Cite

Problem-oriented patient record summary: An early report on a Watson application

Devarakonda

Zhang

Tsou

et al. 2014

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mihaela A. Bornea

Building an efficient RDF store over a relational database

One-copy serializability with snapshot isolation under the hood

Semi-Streamed Index Join for near-real time execution of ETL transformations

Combining Unsupervised Pre-training and Annotator Rationales to Improve Low-shot Text Classification

Double Index NEsted-Loop Reactive Join for Result Rate Optimization

Adaptive Join Operators for Result Rate Optimization on Streaming Inputs

Generative Relation Linking for Question Answering over Knowledge Bases

Problem-oriented patient record summary: An early report on a Watson application

Contact Info

Product

Resources

About