The increasing availability of open statistical data resources is providing novel opportunities for research and citizen science. Efficient algorithmic tools are needed to realize the full potential of the new information resources. We introduce the eurostat R package that provides a collection of custom tools for the Eurostat open data service, including functions to query, download, manipulate, and visualize these data sets in a smooth, automated and reproducible manner. The online documentation provides detailed examples on the analysis of these spatio-temporal data collections. This work provides substantial improvements over the previously available tools, and has been extensively tested by an active user community. The eurostat R package contributes to the growing open source ecosystem dedicated to reproducible research in computational social science and digital humanities.
Utilizing The Cancer Genome Atlas (TCGA) and KM plotter databases we identified six heat shock proteins associated with survival of breast cancer patients. The survival curves of samples with high and low expression of heat shock genes were compared by log-rank test (Mantel-Haenszel). Interestingly, patients overexpressing two identified HSPs – HSPA2 and DNAJC20 exhibited longer survival, whereas overexpression of other four HSPs – HSP90AA1, CCT1, CCT2, CCT6A resulted in unfavorable prognosis for breast cancer patients. We explored correlations between expression level of HSPs and clinicopathological features including tumor grade, tumor size, number of lymph nodes involved and hormone receptor status. Additionally, we identified a novel signature with the potential to serve as a prognostic model for breast cancer. Using univariate Cox regression analysis followed by multivariate Cox regression analysis, we built a risk score formula comprising prognostic HSPs (HSPA2, DNAJC20, HSP90AA1, CCT1, CCT2) and tumor stage to identify high-risk and low-risk cases. Finally, we analyzed the association of six prognostic HSP expression with survival of patients suffering from other types of cancer than breast cancer. We revealed that depending on cancer type, each of the six analyzed HSPs can act both as a positive, as well as a negative regulator of cancer development. Our study demonstrates a novel HSP signature for the outcome prediction of breast cancer patients and provides a new insight into ambiguous role of these proteins in cancer development.
Disease modelling has had considerable policy impact during the ongoing COVID-19 pandemic, and it is increasingly acknowledged that combining multiple models can improve the reliability of outputs. Here we report insights from ten weeks of collaborative short-term forecasting of COVID-19 in Germany and Poland (12 October–19 December 2020). The study period covers the onset of the second wave in both countries, with tightening non-pharmaceutical interventions (NPIs) and subsequently a decay (Poland) or plateau and renewed increase (Germany) in reported cases. Thirteen independent teams provided probabilistic real-time forecasts of COVID-19 cases and deaths. These were reported for lead times of one to four weeks, with evaluation focused on one- and two-week horizons, which are less affected by changing NPIs. Heterogeneity between forecasts was considerable both in terms of point predictions and forecast spread. Ensemble forecasts showed good relative performance, in particular in terms of coverage, but did not clearly dominate single-model predictions. The study was preregistered and will be followed up in future phases of the pandemic.
Complex models are commonly used in predictive modeling. In this paper we present R packages that can be used for explaining predictions from complex black box models and attributing parts of these predictions to input features. We introduce two new approaches and corresponding packages for such attribution, namely live and breakDown. We also compare their results with existing implementations of state-of-the-art solutions, namely lime that implements Locally Interpretable Model-agnostic Explanations and ShapleyR that implements Shapley values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.