Michael Leybovich scite author profile

Michael Leybovich

2Publications

0Citation Statements Received

43Citation Statements Given

How they've been cited

How they cite others

Affiliations

Technion – Israel Institute of Technology

Publications

Order By: Most citations

Towards practical approximate lineage

Leybovich

Shmueli

2022

View full text Add to dashboard Cite

Traditionally, provenance and lineage mainly referred to query results. We take a more holistic approach. We consider a system in which tuples (records) that are produced by a query may affect other tuple insertions into the DB, as part of a normal workflow. Therefore, we consider both direct lineage (dependence of a query result on database tuples directly used in solving the query) and distant lineage (dependence on older tuples that caused the existence

show abstract

Efficient approximate search for sets of lineage vectors

Leybovich

Shmueli

2022

View full text Add to dashboard Cite

One can approximate the lineage of a Database (DB) tuple using a small set of low dimensional vectors. To identify actual lineage tuples using these vector sets, given a set of vectors (of the target tuple), one needs to locate "close" sets of vectors associated with the lineage tuples. We first consider a similarity measure between two sets 𝐴 and 𝐵 of vectors, that balances the average and maximum cosine distance between pairs of vectors, one from set 𝐴 and one from set 𝐵. The proposed similarity measure is intuitive and permutation invariant. To practically realize this measure, we need an approximate search algorithm that given a set of vectors 𝐴 and sets of vectors 𝐵 1 , ..., 𝐵 𝑛 , the algorithm quickly locates the 𝑘-closest sets 𝐵 𝑖 1 , ..., 𝐵 𝑖 𝑘 that maximize the similarity measure. For the case where all sets are singleton sets, essentially each is a single vector, there are known efficient approximate search algorithms, e.g., approximated versions of tree search algorithms, locality-sensitive hashing (LSH), vector quantization (VQ) and proximity graph algorithms. We utilize the mathematical properties of the cosine distance measure to transform the set-set search problem into a vector-vector search problem. However, this abovementioned transformation cannot handle the Euclidean-based version of the similarity measure. For this version, we devise a more elaborate transformation. For this latter transformation, we present algorithms for the general case, with sets of differing cardinalities. The underlying idea in both of these transformations is encoding a set of vectors 𝐴 via |𝐴| "long" independent representative vectors. Then, we are able to transform the set-set search problem into the well-studied approximate (ordinary) vector search problem. For both cosine-based and Euclidean-based similarity measures, the proposed approximate search achieves significant performance gains over an optimized, exact search on vector sets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.