Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants’ answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and k-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the “key” features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.
There is considerable uncertainty regarding treatment of Lyme disease patients who do not respond fully to initial short-term antibiotic therapy. Choosing the best treatment approach and duration remains challenging because treatment response among these patients varies: some patients improve with treatment while others do not. A previous study examined treatment response variation in a sample of over 3500 patients enrolled in the MyLymeData patient registry developed by LymeDisease.org (San Ramon, CA, USA). That study used a validated Global Rating of Change (GROC) scale to identify three treatment response subgroups among Lyme disease patients who remained ill: nonresponders, low responders, and high responders. The present study first characterizes the health status, symptom severity, and percentage of treatment response across these three patient subgroups together with a fourth subgroup, patients who identify as well. We then employed machine learning techniques across these subgroups to determine features most closely associated with improved patient outcomes, and we used traditional statistical techniques to examine how these features relate to treatment response of the four groups. High treatment response was most closely associated with (1) the use of antibiotics or a combination of antibiotics and alternative treatments, (2) longer duration of treatment, and (3) oversight by a clinician whose practice focused on the treatment of tick-borne diseases.
Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-than-meaningful or even redundant topics when the data is biased towards a set of features. For this reason, we propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed Guided NMF, that incorporates user-designed seed word supervision. Our experimental results demonstrate the promise of this model and illustrate that it is competitive with other methods of this ilk with only very little supervision information.
Fully unsupervised topic models have found fantastic success in document clustering and classification. However, these models often suffer from the tendency to learn less-thanmeaningful or even redundant topics when the data is biased towards a set of features. For this reason, we propose an approach based upon the nonnegative matrix factorization (NMF) model, deemed Guided NMF, that incorporates userdesigned seed word supervision. Our experimental results demonstrate the promise of this model and illustrate that it is competitive with other methods of this ilk with only very little supervision information.
Suppose we are given a system of coupled oscillators on an unknown graph along with the trajectory of the system during some period. Can we predict whether the system will eventually synchronize? Even with a known underlying graph structure, this is an important yet analytically intractable question in general. In this work, we take an alternative approach to the synchronization prediction problem by viewing it as a classification problem based on the fact that any given system will eventually synchronize or converge to a non-synchronizing limit cycle. By only using some basic statistics of the underlying graphs such as edge density and diameter, our method can achieve perfect accuracy when there is a significant difference in the topology of the underlying graphs between the synchronizing and the non-synchronizing examples. However, in the problem setting where these graph statistics cannot distinguish the two classes very well (e.g., when the graphs are generated from the same random graph model), we find that pairing a few iterations of the initial dynamics along with the graph statistics as the input to our classification algorithms can lead to significant improvement in accuracy; far exceeding what is known by the classical oscillator theory. More surprisingly, we find that in almost all such settings, dropping out the basic graph statistics and training our algorithms with only initial dynamics achieves nearly the same accuracy. We demonstrate our method on three models of continuous and discrete coupled oscillators—the Kuramoto model, Firefly Cellular Automata, and Greenberg-Hastings model. Finally, we also propose an “ensemble prediction” algorithm that successfully scales our method to large graphs by training on dynamics observed from multiple random subgraphs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.