Hemant Saxena scite author profile

We analyze the problem of discovering dependencies from distributed big data. Existing (non-distributed) algorithms focus on minimizing computation by pruning the search space of possible dependencies. However, distributed algorithms must also optimize communication costs, especially in shared-nothing settings, leading to a more complex optimization space. To understand this space, we introduce six primitives shared by existing dependency discovery algorithms, corresponding to data processing steps separated by communication barriers. Through case studies, we show how the primitives allow us to analyze the design space and develop communication-optimized implementations. Finally, we support our analysis with an experimental evaluation on real datasets.

show abstract

Design and performance analysis of generalised integrator-based controller for grid connected PV system

Saxena

Singh

2018

International Journal of Electronics

View full text Add to dashboard Cite

A Semi-Supervised Framework of Clustering Selection for De-Duplication

Kushagra

Saxena

Ilyas

et al. 2019

View full text Add to dashboard Cite

Data de-duplication is the task of detecting multiple records that correspond to the same real-world entity in a database. In this work, we view de-duplication as a clustering problem where the goal is to put records corresponding to the same physical entity in the same cluster and putting records corresponding to different physical entities into different clusters.We introduce a framework which we call promise correlation clustering. Given a complete graph G with the edges labelled 0 and 1, the goal is to find a clustering that minimizes the number of 0 edges within a cluster plus the number of 1 edges across different clusters (or correlation loss). The optimal clustering can also be viewed as a complete graph G * with edges corresponding to points in the same cluster being labelled 0 and other edges being labelled 1. Under the promise that the edge difference between G and G * is "small", we prove that finding the optimal clustering (or G * ) is still NP-Hard. [Ashtiani et al., 2016] introduced the framework of semi-supervised clustering, where the learning algorithm has access to an oracle, which answers whether two points belong to the same or different clusters. We further prove that even with access to a same-cluster oracle, the promise version is NP-Hard as long as the number queries to the oracle is not too large (o(n) where n is the number of vertices).Given these negative results, we consider a restricted version of correlation clustering. As before, the goal is to find a clustering that minimizes the correlation loss. However, we restrict ourselves to a given class F of clusterings. We offer a semi-supervised algorithmic approach to solve the restricted variant with success guarantees.

show abstract

Triterpenoids ofAdenanthera pavoninaRoot

Verma

Saxena

1982

International Journal of Crude Drug Research

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hemant Saxena

Adaptive spline‐based PLL for synchronisation and power quality improvement in distribution system

Distributed implementations of dependency discovery algorithms

Design and performance analysis of generalised integrator-based controller for grid connected PV system

A Semi-Supervised Framework of Clustering Selection for De-Duplication

Triterpenoids ofAdenanthera pavoninaRoot

Contact Info

Product

Resources

About