The incidence of self-reported severe hypoglycaemia in insulin-treated Type 2 diabetes is lower than in Type 1 diabetes but does occur more often than previously reported and with sufficient frequency to cause significant morbidity. Duration of insulin treatment is a key predictor of hypoglycaemia in insulin-treated Type 2 diabetes.
Abstract. We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher Linear Discriminant classifiers, focusing on the case when there are fewer training observations than data dimensions. The specific form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classifier and classifier ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents overfitting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical findings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm. A preliminary version of this work received the best paper award at the 5th Asian Conference on Machine Learning.
Estimations of distribution algorithms (EDAs) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging, and as a result existing EDAs may become less attractive in large-scale problems because of the associated large computational requirements. Large-scale continuous global optimisation is key to many modern-day real-world problems. Scaling up EAs to large-scale problems has become one of the biggest challenges of the field. This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective and efficient EDA-type algorithms for large-scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections to low dimensions of the set of fittest search points as a basis for developing a new and generic divide-and-conquer methodology. Our ideas are rooted in the theory of random projections developed in theoretical computer science, and in developing and analysing our framework we exploit some recent results in nonasymptotic random matrix theory.
a b s t r a c tBeyer et al. gave a sufficient condition for the high dimensional phenomenon known as the concentration of distances. Their work has pinpointed serious problems due to nearest neighbours not being meaningful in high dimensions. Here we establish the converse of their result, in order to answer the question as to when nearest neighbour is still meaningful in arbitrarily high dimensions. We then show for a class of realistic data distributions having non-i.i.d. dimensions, namely the family of linear latent variable models, that the Euclidean distance will not concentrate as long as the amount of 'relevant' dimensions grows no slower than the overall data dimensions. This condition is, of course, often not met in practice. After numerically validating our findings, we examine real data situations in two different areas (text-based document collections and gene expression arrays), which suggest that the presence or absence of distance concentration in high dimensional problems plays a role in making the data hard or easy to work with.
Neonatal sepsis causes significant mortality and morbidity worldwide. Diagnosis is usually confirmed via blood culture results. Blood culture sepsis confirmation can take days and suffer from contamination and false negatives. Empiric therapy with antibiotics is common. This study aims to retrospectively describe and compare treatments of blood culture-confirmed and unconfirmed, but suspected, sepsis within the University of Utah Hospital system. Electronic health records were obtained from 1,248 neonates from January 1, 2006, to December 31, 2017. Sepsis was categorized into early-onset (≤3 days of birth, EOS) and late-onset (>3 and ≤28 days of birth, LOS) and categorized as culture-confirmed sepsis if a pathogen was cultured from the blood and unconfirmed if all blood cultures were negative with no potentially contaminated blood cultures. Of 1,010 neonates in the EOS cohort, 23 (2.3%) were culture-confirmed, most with Escherichia coli (42%). Treatment for unconfirmed EOS lasted an average of 6.1 days with primarily gentamicin and ampicillin while confirmed patients were treated for an average of 12.3 days with increased administration of cefotaxime. Of 311 neonates in the LOS cohort, 62 (20%) were culture-confirmed, most culturing coagulase negative staphylococci (46%). Treatment courses for unconfirmed LOS lasted an average of 7.8 days while confirmed patients were treated for an average of 11.4 days, these patients were primarily treated with vancomycin and gentamicin. The use of cefotaxime for unconfirmed EOS and LOS increased throughout the study period. Cefotaxime administration was associated with an increase in neonatal mortality, even when potential confounding factors were added to the logistic regression model (adjusted odds ratio 2.8, 95%CI [1.21, 6.88], p = 0.02). These results may not be generalized to all hospitals and the use of cefotaxime may be a surrogate for other factors. Given the low rate of blood culture positive diagnosis and the high exposure rate of empiric antibiotics, this patient population might benefit from improved diagnostics with reevaluation of antibiotic use guidelines.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.