Hai Ying Wang scite author profile

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities. In this paper, we propose fast subsampling algorithms to efficiently approximate the maximum likelihood estimate in logistic regression. We first establish consistency and asymptotic normality of the estimator from a general subsampling algorithm, and then derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator. An alternative minimization criterion is also proposed to further reduce the computational cost. The optimal subsampling probabilities depend on the full data estimate, so we develop a two-step algorithm to approximate the optimal subsampling procedure. This algorithm is computationally efficient and has a significant reduction in computing time compared to the full data approach. Consistency and asymptotic normality of the estimator from a two-step algorithm are also established. Synthetic and real data sets are used to evaluate the practical performance of the proposed method.

show abstract

The expression ofbeclin 1is associated with favorable prognosis in stage IIIB colon cancers

Li¹,

Li²,

Peng³

et al. 2009

Autophagy

135

128

View full text Add to dashboard Cite

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Wang

Yang

Stufken

2018

Journal of the American Statistical Association

162

140

View full text Add to dashboard Cite

Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in big data analysis is data reduction. Existing investigations in the context of linear regression focus on subsampling-based methods. However, not only is this approach prone to sampling errors, it also leads to a covariance matrix of the estimators that is typically bounded from below by a term that is of the order of the inverse of the subdata size. We propose a novel approach, termed information-based optimal subdata selection (IBOSS). Compared to leading existing subdata methods, the IBOSS approach has the following advantages: (i) it is significantly faster; (ii) it is suitable for distributed parallel computing; (iii) the variances of the slope parameter estimators converge to 0 as the full data size increases even if the subdata size is fixed, i.e., the convergence rate depends on the full data size; (iv) data analysis for IBOSS subdata is straightforward and the sampling distribution of an IBOSS estimator is easy to assess. Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude. The advantages of the new approach are also illustrated through analysis of real data.

show abstract

Redox-switchable breathing behavior in tetrathiafulvalene-based metal–organic frameworks

et al. 2017

View full text Add to dashboard Cite

Metal–organic frameworks (MOFs) that respond to external stimuli such as guest molecules, temperature, or redox conditions are highly desirable. Herein, we coupled redox-switchable properties with breathing behavior induced by guest molecules in a single framework. Guided by topology, two flexible isomeric MOFs, compounds 1 and 2, with a formula of In(Me2NH2)(TTFTB), were constructed via a combination of [In(COO)4]− metal nodes and tetratopic tetrathiafulvalene-based linkers (TTFTB). The two compounds show different breathing behaviors upon the introduction of N2. Single-crystal X-ray diffraction, accompanied by molecular simulations, reveals that the breathing mechanism of 1 involves the bending of metal–ligand bonds and the sliding of interpenetrated frameworks, while 2 undergoes simple distortion of linkers. Reversible oxidation and reduction of TTF moieties changes the linker flexibility, which in turn switches the breathing behavior of 2. The redox-switchable breathing behavior can potentially be applied to the design of stimuli-responsive MOFs.

show abstract

Agrobacterium tumefaciens-Mediated Transformation of the Lichen Fungus, Umbilicaria muehlenbergii

et al. 2013

View full text Add to dashboard Cite

Transformation-mediated mutagenesis in both targeted and random manners has been widely applied to decipher gene function in diverse fungi. However, a transformation system has not yet been established for lichen fungi, severely limiting our ability to study their biology and mechanism underpinning symbiosis via gene manipulation. Here, we report the first successful transformation of the lichen fungus, Umbilicaria muehlenbergii, via the use of Agrobacterium tumefaciens. We generated a total of 918 transformants employing a binary vector that carries the hygromycin B phosphotransferase gene as a selection marker and the enhanced green fluorescent protein gene for labeling transformants. Randomly selected transformants appeared mitotically stable, based on their maintenance of hygromycin B resistance after five generations of growth without selection. Genomic Southern blot showed that 88% of 784 transformants contained a single T-DNA insert in their genome. A number of putative mutants affected in colony color, size, and/or morphology were found among these transformants, supporting the utility of Agrobacterium tumefaciens-mediated transformation (ATMT) for random insertional mutagenesis of U. muehlenbergii. This ATMT approach potentially offers a systematic gene functional study with genome sequences of U. muehlenbergii that is currently underway.

show abstract

Optimal subsampling for quantile regression in big data

Wang

2020

View full text Add to dashboard Cite

Summary We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the asymptotic variance-covariance matrix for a linearly transformed parameter estimator and the other minimizes that of the original parameter estimator. The former does not depend on the densities of the responses given covariates and is easy to implement. Algorithms based on optimal subsampling probabilities are proposed and asymptotic distributions, and the asymptotic optimality of the resulting estimators are established. Furthermore, we propose an iterative subsampling procedure based on the optimal subsampling probabilities in the linearly transformed parameter estimation which has great scalability to utilize available computational resources. In addition, this procedure yields standard errors for parameter estimators without estimating the densities of the responses given the covariates. We provide numerical examples based on both simulated and real data to illustrate the proposed method.

show abstract

Multidrug-Resistant Tuberculosis, People’s Republic of China, 2007–2009

He¹,

Wang²,

Borgdorff³

et al. 2011

Emerg. Infect. Dis.

View full text Add to dashboard Cite

show abstract

Preparation and characterization of porous TiO2/ZnO composite nanofibers via electrospinning

Wang

Yang

et al. 2010

Chinese Chemical Letters

View full text Add to dashboard Cite

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hai Ying Wang

Optimal Subsampling for Large Sample Logistic Regression

The expression ofbeclin 1is associated with favorable prognosis in stage IIIB colon cancers

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Redox-switchable breathing behavior in tetrathiafulvalene-based metal–organic frameworks

Agrobacterium tumefaciens-Mediated Transformation of the Lichen Fungus, Umbilicaria muehlenbergii

Optimal subsampling for quantile regression in big data

Multidrug-Resistant Tuberculosis, People’s Republic of China, 2007–2009

Preparation and characterization of porous TiO2/ZnO composite nanofibers via electrospinning

Contact Info

Product

Resources

About