Hai Ying Wang scite author profile

For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least squares estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities. In this paper, we propose fast subsampling algorithms to efficiently approximate the maximum likelihood estimate in logistic regression. We first establish consistency and asymptotic normality of the estimator from a general subsampling algorithm, and then derive optimal subsampling probabilities that minimize the asymptotic mean squared error of the resultant estimator. An alternative minimization criterion is also proposed to further reduce the computational cost. The optimal subsampling probabilities depend on the full data estimate, so we develop a two-step algorithm to approximate the optimal subsampling procedure. This algorithm is computationally efficient and has a significant reduction in computing time compared to the full data approach. Consistency and asymptotic normality of the estimator from a two-step algorithm are also established. Synthetic and real data sets are used to evaluate the practical performance of the proposed method.

show abstract

The expression ofbeclin 1is associated with favorable prognosis in stage IIIB colon cancers

Li¹,

Li²,

Peng³

et al. 2009

Autophagy

136

128

View full text Add to dashboard Cite

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Wang

Yang

Stufken

2018

Journal of the American Statistical Association

175

140

View full text Add to dashboard Cite

Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in big data analysis is data reduction. Existing investigations in the context of linear regression focus on subsampling-based methods. However, not only is this approach prone to sampling errors, it also leads to a covariance matrix of the estimators that is typically bounded from below by a term that is of the order of the inverse of the subdata size. We propose a novel approach, termed information-based optimal subdata selection (IBOSS). Compared to leading existing subdata methods, the IBOSS approach has the following advantages: (i) it is significantly faster; (ii) it is suitable for distributed parallel computing; (iii) the variances of the slope parameter estimators converge to 0 as the full data size increases even if the subdata size is fixed, i.e., the convergence rate depends on the full data size; (iv) data analysis for IBOSS subdata is straightforward and the sampling distribution of an IBOSS estimator is easy to assess. Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude. The advantages of the new approach are also illustrated through analysis of real data.

show abstract

Redox-switchable breathing behavior in tetrathiafulvalene-based metal–organic frameworks

et al. 2017

View full text Add to dashboard Cite

Metal–organic frameworks (MOFs) that respond to external stimuli such as guest molecules, temperature, or redox conditions are highly desirable. Herein, we coupled redox-switchable properties with breathing behavior induced by guest molecules in a single framework. Guided by topology, two flexible isomeric MOFs, compounds 1 and 2, with a formula of In(Me2NH2)(TTFTB), were constructed via a combination of [In(COO)4]− metal nodes and tetratopic tetrathiafulvalene-based linkers (TTFTB). The two compounds show different breathing behaviors upon the introduction of N2. Single-crystal X-ray diffraction, accompanied by molecular simulations, reveals that the breathing mechanism of 1 involves the bending of metal–ligand bonds and the sliding of interpenetrated frameworks, while 2 undergoes simple distortion of linkers. Reversible oxidation and reduction of TTF moieties changes the linker flexibility, which in turn switches the breathing behavior of 2. The redox-switchable breathing behavior can potentially be applied to the design of stimuli-responsive MOFs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hai Ying Wang

Optimal Subsampling for Large Sample Logistic Regression

The expression ofbeclin 1is associated with favorable prognosis in stage IIIB colon cancers

Information-Based Optimal Subdata Selection for Big Data Linear Regression

Redox-switchable breathing behavior in tetrathiafulvalene-based metal–organic frameworks

Contact Info

Product

Resources

About