Haoyin Xu scite author profile

Haoyin Xu

5Publications

6Citation Statements Received

42Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of California, San Diego, Johns Hopkins University

Publications

Order By: Most citations

When are Deep Networks really better than Decision Forests at small sample sizes, and how?

Xu¹,

Kinfu²,

LeVine³

et al. 2021

Preprint

View full text Add to dashboard Cite

Random forests (RF) and deep networks (DN) are two of the most popular machine learning methods in the current scientific literature and yield differing levels of performance on different data modalities. We wish to further explore and establish the conditions and domains in which each approach excels, particularly in the context of sample size and feature dimension. To address these issues, we tested the performance of these approaches across tabular, image, and audio settings using varying model parameters and architectures. Our focus is on datasets with at most 10,000 samples, which represent a large fraction of scientific and biomedical datasets. In general, we found RF to excel at tabular and structured data (image and audio) with small sample sizes, whereas DN performed better on structured data with larger sample sizes. Although we plan to continue updating this technical report in the coming months, we believe the current preliminary results may be of interest to others.Computing All datasets with over 10,000 samples were randomly downsampled to 10,000 samples. Next, for each dataset, the training data were indexed into eight subsets with evenly spaced sample sizes on a logarithmic scale, thus producing eight training sets with different sample sizes. The smallest

show abstract

Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity

Vogelstein¹,

Dey²,

Helm³

et al. 2020

Preprint

View full text Add to dashboard Cite

Deep discriminative to kernel generative modeling

Dey¹,

LeVine²,

Silva³

et al. 2022

Preprint

View full text Add to dashboard Cite

Simplest Streaming Trees

Xu¹,

Dey²,

Panda³

et al. 2021

Preprint

View full text Add to dashboard Cite

Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity

Dey¹,

Vogelstein

Helm

et al. 2022

Preprint

View full text Add to dashboard Cite

In biological learning, data are used to improve performance not only on the current task, but also on previously encountered, and as yet unencountered tasks. In contrast, classical machine learning starts from a blank slate, or tabula rasa, using data only for the single task at hand. While typical transfer learning algorithms can improve performance on future tasks, their performance on prior tasks degrades upon learning new tasks (called catastrophic forgetting). Many recent approaches for continual or lifelong learning have attempted to maintain performance given new tasks. But striving to avoid forgetting sets the goal unnecessarily low: the goal of lifelong learning, whether biological or artificial, should be to improve performance on both past tasks (backward transfer) and future tasks (forward transfer) with any new data. Our key insight is that even though learners trained on other tasks often cannot make useful decisions on the current task (the two tasks may have non-overlapping classes, for example), they may have learned representations that are useful for this task. Thus, although ensembling decisions is not possible, ensembling representations can be beneficial whenever the distributions across tasks are sufficiently similar. Moreover, we can ensemble representations learned independently across tasks in quasilinear space and time. We therefore propose two algorithms: representation ensembles of (1) trees and (2) networks. Both algorithms demonstrate both forward and backward transfer in a variety of simulated and real data scenarios, including tabular, image, and spoken, and adversarial tasks. This is in stark contrast to the reference algorithms we compared to, all of which failed to transfer either forward or backward, or both, despite that many of them require quadratic space or time complexity.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haoyin Xu

When are Deep Networks really better than Decision Forests at small sample sizes, and how?

Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity

Deep discriminative to kernel generative modeling

Simplest Streaming Trees

Representation Ensembling for Synergistic Lifelong Learning with Quasilinear Complexity

Contact Info

Product

Resources

About