Zhihan Guo scite author profile

Ρεκατσίνας

2020

We study the problem of discovering functional dependencies (FD) from a noisy data set. We adopt a statistical perspective and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy data set is equivalent to learning the structure of a model over binary random variables, where each random variable corresponds to a functional of the data set attributes. We build upon this observation to introduce FDX a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that FDX can recover true functional dependencies across a diverse array of realworld and synthetic data sets, even in the presence of noisy or missing data. We find that FDX scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F 1 improvement of 2× against state-of-the-art FD discovery methods.

A partially confirmatory approach to scale development with the Bayesian Lasso.

Chen¹,

Guo²,

Zhang³

et al. 2021

Psychological Methods

The exploratory and confirmatory approaches of factor analysis lie on two ends of a continuum of substantive input for scale development. Recent advancements in Bayesian regularization methods enable more flexibility in covering a wide range of the substantive continuum. Based on the Bayesian Lasso (least absolute shrinkage and selection operator) methods for the regression model and covariance matrix, this research proposes a partially confirmatory approach to address the loading and residual structures at the same time. With at least one specified loading per item, a one-step procedure can be applied to figure out both structures simultaneously. With a few specified loadings per factor, a two-step procedure is preferred to capture the model configuration correctly. In both cases, the Bayesian hierarchical formulation is implemented using Markov Chain Monte Carlo estimation with different Lasso or regular priors. Both simulated and real data sets were analyzed to evaluate the validity, robustness, and practical usefulness of the proposed approach across different situations.

Impact of macroporous silica nanoparticles at sub-50nm on bio-behaviors and biosafety in drug-resistant cancer models

Zhou

Colloids and Surfaces B: Biointerfaces

et al. 2021

Releasing Locks As Early As You Can: Reducing Contention of Hotspots by Violating Two-Phase Locking

Yan

et al. 2021

An update on the scalability limits of the Condor batch system

Bradley

Clair

Farrellee

et al. 2011

J. Phys.: Conf. Ser.

Condor is being used extensively in the HEP environment. It is the batch system of choice for many compute farms, including several WLCG Tier 1s, Tier 2s and Tier 3s. It is also the building block of one of the Grid pilot infrastructures, namely glideinWMS. As with any software, Condor does not scale indefinitely with the number of users and/or the number of resources being handled. In this paper we are presenting the current observed scalability limits of both the latest production and the latest development release of Condor, and compare them with the limits reported in previous publications. A description of what changes were introduced to remove the previous scalability limits are also presented.

Review on Service Composition

Hua

2021

Biochemical and Biophysical Research Communications

Roles of TRPM4 in immune responses in keratinocytes and identification of a novel TRPM4-activating agent

Saito

Fujita

Toriyama

et al. 2023

Releasing Locks As Early As You Can: Reducing Contention of Hotspots by Violating Two-Phase Locking (Extended Version)

Guo¹,

Wu²,

Yan³

et al. 2021

Preprint

Hotspots, a small set of tuples frequently read/written by a large number of transactions, cause contention in a concurrency control protocol. While a hotspot may comprise only a small fraction of a transaction's execution time, conventional strict two-phase locking allows a transaction to release lock only after the transaction completes, which leaves significant parallelism unexploited. Ideally, a concurrency control protocol serializes transactions only for the duration of the hotspots, rather than the duration of transactions.We observe that exploiting such parallelism requires violating two-phase locking. In this paper, we propose Bamboo, a new concurrency control protocol that can enable such parallelism by modifying the conventional two-phase locking, while maintaining the same guarantees in correctness. We thoroughly analyzed the effect of cascading aborts involved in reading uncommitted data and discussed optimizations that can be applied to further improve the performance. Our evaluation on TPC-C shows a performance improvement up to 4× compared to the best of pessimistic and optimistic baseline protocols. On synthetic workloads that contain a single hotspot, Bamboo achieves a speedup up to 19× over baselines.