Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable endto-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Twitter has rapidly grown to a popular social network in recent years and provides a large number of real-time messages for users. Tweets are presented in chronological order and users scan the followees' timelines to find what they are interested in. However, an information overload problem has troubled many users, especially those with many followees and thousands of tweets arriving every day. In this paper, we focus on recommending useful tweets that users are really interested in personally to reduce the users' effort to find useful information. Many kinds of information on Twitter are available for helping recommendation, including the user's own tweet history, retweet history and social relations between users. We propose a method of making tweet recommendations based on collaborative ranking to capture personal interests. It can also conveniently integrate the other useful contextual information. Our final method considers three major elements on Twitter: tweet topic level factors, user social relation factors and explicit features such as authority of the publisher and quality of the tweet. The experiments show that all the proposed elements are important and our method greatly outperforms several baseline methods.
Collaborative filtering techniques rely on aggregated user preference data to make personalized predictions. In many cases, users are reluctant to explicitly express their preferences and many recommender systems have to infer them from implicit user behaviors, such as clicking a link in a webpage or playing a music track. The clicks and the plays are good for indicating the items a user liked (i.e., positive training examples), but the items a user did not like (negative training examples) are not directly observed. Previous approaches either randomly pick negative training samples from unseen items or incorporate some heuristics into the learning model, leading to a biased solution and a prolonged training period. In this paper, we propose to dynamically choose negative training samples from the ranked list produced by the current prediction model and iteratively update our model. The experiments conducted on three large-scale datasets show that our approach not only reduces the training time, but also leads to significant performance gains.
SignificanceIdentifying predictive biomarkers of therapeutic response for melanoma patients treated with immune checkpoint inhibitors is a major challenge. By combining microfluidic enrichment for melanoma circulating tumor cells (CTCs) together with RNA-based droplet digital PCR quantitation, we have established a highly sensitive and robust platform for noninvasive, blood-based monitoring of tumor burden. Serial monitoring of melanoma patients treated with immune checkpoint inhibitors shows rapid changes in CTC score, which precede standard clinical assessment and are highly predictive of long-term clinical outcome. Early on-treatment digital monitoring of CTC dynamics may thus help identify patients likely to benefit from immune checkpoint inhibition therapy.
We sought a regimen that incorporates optimal novel agents in transplant-ineligible patients that balances efficacy with toxicity. Our study evaluated modified lenalidomide-bortezomib-dexamethasone (RVD lite) in this population. RVD lite was administered over a 35-day cycle. Lenalidomide 15 mg was given orally days 1–21; bortezomib 1.3 mg/m2 weekly subcutaneously (SC) on days 1, 8, 15, and 22; and dexamethasone 20 mg orally day of and after bortezomib for 9 cycles followed by 6 cycles of consolidation with lenalidomide and bortezomib. Primary objective was to evaluate overall response rate (ORR). Secondary objectives included safety, progression free survival (PFS), and overall survival (OS). Fifty-three eligible patients screened between 4/17/13 and 5/20/15; 50 received at least one dose of therapy. Median age at study entry was 73 years (range 65–91). The ORR was 86% and 66% of patients achieved a very good partial response (VGPR) or better. Median PFS was 35.1 months (95% CI, 30.9 - ∞) and median OS was not reached at a median follow-up of 30 months. Peripheral neuropathy was reported in 31 (62%) patients with only 1 patient experiencing grade 3 symptoms. RVD lite is a well-tolerated and highly effective regimen in the transplant-ineligible population with robust PFS and OS.
Background: Merkel cell carcinoma (MCC) is a highly aggressive neuroendocrine carcinoma of the skin caused by either the integration of Merkel cell polyomavirus (MCPyV) and expression of viral T antigens or by ultravioletinduced damage to the tumor genome from excessive sunlight exposure. An increasing number of deep sequencing studies of MCC have identified significant differences between the number and types of point mutations, copy number alterations, and structural variants between virus-positive and virus-negative tumors. However, it has been challenging to reliably distinguish between virus positive and UV damaged MCC. Methods: In this study, we assembled a cohort of 71 MCC patients and performed deep sequencing with OncoPanel, a clinically implemented, next-generation sequencing assay targeting over 400 cancer-associated genes. To improve the accuracy and sensitivity for virus detection compared to traditional PCR and IHC methods, we developed a hybrid capture baitset against the entire MCPyV genome and software to detect integration sites and structure. Results: Sequencing from this approach revealed distinct integration junctions in the tumor genome and generated assemblies that strongly support a model of microhomology-initiated hybrid, virus-host, circular DNA intermediate that promotes focal amplification of host and viral DNA. Using the clear delineation between virus-positive and virusnegative tumors from this method, we identified recurrent somatic alterations common across MCC and alterations specific to each class of tumor, associated with differences in overall survival. Finally, comparing the molecular and clinical data from these patients revealed a surprising association of immunosuppression with virus-negative MCC and significantly shortened overall survival. Conclusions: These results demonstrate the value of high-confidence virus detection for identifying molecular mechanisms of UV and viral oncogenesis in MCC. Furthermore, integrating these data with clinical data revealed features that could impact patient outcome and improve our understanding of MCC risk factors.
Blood-based biomarkers are critical in metastatic prostate cancer, where characteristic bone metastases are not readily sampled, and they may enable risk stratification in localized disease. We established a sensitive and high-throughput strategy for analyzing prostate circulating tumor cells (CTC) using microfluidic cell enrichment followed by digital quantitation of prostate-derived transcripts. In a prospective study of 27 patients with metastatic castration-resistant prostate cancer treated with first-line abiraterone, pretreatment elevation of the digital CTC score identifies a high-risk population with poor overall survival (HR = 6.0; = 0.01) and short radiographic progression-free survival (HR = 3.2; = 0.046). Expression of in CTCs identifies 6 of 6 patients with ≤12-month survival, with a subset also expressing the splice variant. In a second cohort of 34 men with localized prostate cancer, an elevated preoperative CTC score predicts microscopic dissemination to seminal vesicles and/or lymph nodes ( < 0.001). Thus, digital quantitation of CTC-specific transcripts enables noninvasive monitoring that may guide treatment selection in both metastatic and localized prostate cancer. There is an unmet need for biomarkers to guide prostate cancer therapies, for curative treatment of localized cancer and for application of molecularly targeted agents in metastatic disease. Digital quantitation of prostate CTC-derived transcripts in blood specimens is predictive of abiraterone response in metastatic cancer and of early dissemination in localized cancer. .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.