Background: Tumor markers are used to screen tens of millions of individuals worldwide at annual health check-ups, especially in East Asia. Machine learning (ML)-based algorithms that improve the diagnostic accuracy and clinical utility of these tests can have substantial impact leading to the early diagnosis of cancer. Methods: ML-based algorithms, including a cancer screening algorithm and a secondary organ of origin algorithm, were developed and validated using a large real world dataset (RWD) from asymptomatic individuals undergoing routine cancer screening at a Taiwanese medical center between May 2001 and April 2015. External validation was performed using data from the same period from a separate medical center. The data set included tumor marker values, age, and gender from 27,938 individuals, including 342 subsequently confirmed cancer cases. Results: Separate gender-specific cancer screening algorithms were developed. For men, a logistic regression-based algorithm outperformed single-marker and other ML-based algorithms, with a mean area under the receiver operating characteristic curve (AUROC) of 0.7654 in internal and 0.8736 in external cross validation. For women, a random forest-based algorithm attained a mean AUROC of 0.6665 in internal and 0.6938 in external cross validation. The median time to cancer diagnosis (TTD) in men was 451.5, 204.5, and 28 days for the mild, moderate, and high-risk groups, respectively; for women, the median TTD was 229, 132, and 125 days for the mild, moderate, and high-risk groups. A second algorithm was developed to predict the most likely affected organ systems for at-risk individuals. The algorithm yielded 0.8120 sensitivity and 0.6490 specificity for men, and 0.8170 sensitivity and 0.6750 specificity for women. Conclusions: ML-derived algorithms, trained and validated by using a RWD, can significantly improve tumor marker-based screening for multiple types of early stage cancers, suggest the tissue of origin, and provide guidance for patient follow-up.
Although timely access to information is becoming increasingly important and gaining such access is no longer a problem, the capacity for humans to assimilate such huge amounts of information is limited. Topic Detection(TD) is then a promising research area that addresses speedy access of desired information. However, ironically, the time complexity of existing TD algorithms themselves is usually O(n 3 ) or up to the x-th power of e. Linear performance requirement of real world topic detection has not been significantly addressed. This paper reveals a new patented topic detection algorithm called RMIR that combines relevance model with information retrieval technique to improve on time efficiency. Relevance Model(RM) is a theoretical extension of statistical language modeling that was developed for the task of document retrieval. To reduce the costs of fetching RM, we reduce the number of comparisons for stories by a query-based approach that makes similar stories exist in the top-k query results. We also build our query based on inverted indices, which have the complexity close to linear. The time cost of rest of operations in the RMIR topic detection process is a constant. Hence, the total complexity of RMIR topic detection algorithm should be close to linear as shown in experimental results. In addition, RMIR also gains better detection rates and robustness by relative entropy based topic model design.
Abstract-To keep pace with the time, learning from printed medium alone is no longer a comprehensive approach. Fresh digital contents can definitely be the complement of printed education medium. Although timely access to fresh contents is becoming increasingly important for education and gaining such access is no longer a problem, the capacity for human teachers to assimilate such huge amounts of contents is limited. Topic Detection (TD) is then a promising research area that addresses speedy access of desired contents based on topic or subject. On the other hand, personalized education is getting more attention because it facilitates the improvement of creativity and subject learning of the students. This paper reveals a patented Personalized Subject Learning (PSL) system that caters for the need of personalized education and efficiently provides subject based contents. An efficient topic detection algorithm for providing subject content is presented. Moreover, since education contents are multimedia based ones with multimodal, PSL introduces Canonical Correlation Analysis (CCA) method to detect multimodal correlations across different types of media. Due to its novelty, PSL has been used as the key engine in a real world application of personalized education system as the smart education module sponsored by a Smart City project.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.