In biomedical research many different types of patient data can be collected, including various types of omics data and medical imaging modalities. Applying multi-view learning to these different sources of information can increase the accuracy of medical classification models compared with single-view procedures. However, the collection of biomedical data can be expensive and taxing on patients, so that superfluous data collection should be avoided. It is therefore necessary to develop multi-view learning methods which can accurately identify the views most important for prediction.In recent years, several biomedical studies have used an approach known as multi-view stacking (MVS), where a model is trained on each view separately and the resulting predictions are combined through stacking. In these studies, MVS has been shown to increase classification accuracy. However, the MVS framework can also be used for selecting a subset of important views.To study the view selection potential of MVS, we develop a special case called stacked penalized logistic regression (StaPLR). Compared with existing view-selection methods, StaPLR can make use of faster optimization algorithms and is easily parallelized. We show that nonnegativity constraints on the parameters of the function which combines the views are important for preventing unimportant views from entering the model. We investigate the performance of StaPLR through simulations, and consider two real data examples. We compare the performance of StaPLR with an existing view selection method called the group lasso and observe that, in terms of view selection, StaPLR has a consistently lower false positive rate.
Multi-view data refers to a setting where features are divided into feature sets, for example because they correspond to different sources. Stacked penalized logistic regression (StaPLR) is a recently introduced method that can be used for classification and automatically selecting the views that are most important for prediction. We introduce an extension of this method to a setting where the data has a hierarchical multi-view structure. We also introduce a new view importance measure for StaPLR, which allows us to compare the importance of views at any level of the hierarchy. We apply our extended StaPLR algorithm to Alzheimer's disease classification where different MRI measures have been calculated from three scan types: structural MRI, diffusion-weighted MRI, and resting-state fMRI. StaPLR can identify which scan types and which derived MRI measures are most important for classification, and it outperforms elastic net regression in classification performance.
Bibliometric‐enhanced information retrieval uses bibliometrics (e.g., citations) to improve ranking algorithms. Using a data‐driven approach, this article describes the development of a bibliometric‐enhanced ranking algorithm for legal information retrieval, and the evaluation thereof. We statistically analyze the correlation between usage of documents and citations over time, using data from a commercial legal search engine. We then propose a bibliometric boost function that combines usage of documents with citation counts. The core of this function is an impact variable based on usage and citations that increases in influence as citations and usage counts become more reliable over time. We evaluate our ranking function by comparing search sessions before and after the introduction of the new ranking in the search engine. Using a cost model applied to 129,571 sessions before and 143,864 sessions after the intervention, we show that our bibliometric‐enhanced ranking algorithm reduces the time of a search session of legal professionals by 2 to 3% on average for use cases other than known‐item retrieval or updating behavior. Given the high hourly tariff of legal professionals and the limited time they can spend on research, this is expected to lead to increased efficiency, especially for users with extremely long search sessions.
Background Increasingly, social media is being recognized as a potential resource for patient-generated health data, for example, for pharmacovigilance. Although the representativeness of the web-based patient population is often noted as a concern, studies in this field are limited. Objective This study aimed to investigate the sample bias of patient-centered social media in Dutch patients with gastrointestinal stromal tumor (GIST). Methods A population-based survey was conducted in the Netherlands among 328 patients with GIST diagnosed 2-13 years ago to investigate their digital communication use with fellow patients. A logistic regression analysis was used to analyze clinical and demographic differences between forum users and nonusers. Results Overall, 17.9% (59/328) of survey respondents reported having contact with fellow patients via social media. Moreover, 78% (46/59) of forum users made use of GIST patient forums. We found no statistically significant differences for age, sex, socioeconomic status, and time since diagnosis between forum users (n=46) and nonusers (n=273). Patient forum users did differ significantly in (self-reported) treatment phase from nonusers (P=.001). Of the 46 forum users, only 2 (4%) were cured and not being monitored; 3 (7%) were on adjuvant, curative treatment; 19 (41%) were being monitored after adjuvant treatment; and 22 (48%) were on palliative treatment. In contrast, of the 273 patients who did not use disease-specific forums to communicate with fellow patients, 56 (20.5%) were cured and not being monitored, 31 (11.3%) were on curative treatment, 139 (50.9%) were being monitored after treatment, and 42 (15.3%) were on palliative treatment. The odds of being on a patient forum were 2.8 times as high for a patient who is being monitored compared with a patient that is considered cured. The odds of being on a patient forum were 1.9 times as high for patients who were on curative (adjuvant) treatment and 10 times as high for patients who were in the palliative phase compared with patients who were considered cured. Forum users also reported a lower level of social functioning (84.8 out of 100) than nonusers (93.8 out of 100; P=.008). Conclusions Forum users showed no particular bias on the most important demographic variables of age, sex, socioeconomic status, and time since diagnosis. This may reflect the narrowing digital divide. Overrepresentation and underrepresentation of patients with GIST in different treatment phases on social media should be taken into account when sourcing patient forums for patient-generated health data. A further investigation of the sample bias in other web-based patient populations is warranted.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.