BackgroundAnalyzing high throughput genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation. A computational platform enabling best practice genomics analysis ideally meets a number of requirements, including: a wide range of analysis and visualisation tools, closely linked to large user and reference data sets; workflow platform(s) enabling accessible, reproducible, portable analyses, through a flexible set of interfaces; highly available, scalable computational resources; and flexibility and versatility in the use of these resources to meet demands and expertise of a variety of users. Access to an appropriate computational platform can be a significant barrier to researchers, as establishing such a platform requires a large upfront investment in hardware, experience, and expertise.ResultsWe designed and implemented the Genomics Virtual Laboratory (GVL) as a middleware layer of machine images, cloud management tools, and online services that enable researchers to build arbitrarily sized compute clusters on demand, pre-populated with fully configured bioinformatics tools, reference datasets and workflow and visualisation options. The platform is flexible in that users can conduct analyses through web-based (Galaxy, RStudio, IPython Notebook) or command-line interfaces, and add/remove compute nodes and data resources as required. Best-practice tutorials and protocols provide a path from introductory training to practice. The GVL is available on the OpenStack-based Australian Research Cloud (http://nectar.org.au) and the Amazon Web Services cloud. The principles, implementation and build process are designed to be cloud-agnostic.ConclusionsThis paper provides a blueprint for the design and implementation of a cloud-based Genomics Virtual Laboratory. We discuss scope, design considerations and technical and logistical constraints, and explore the value added to the research community through the suite of services and resources provided by our implementation.
The increasing suicide rate in the United States has amplified the need to assure that regions with high suicide risk receive adequate funding programs and related resources for prevention methods. The way in which organizations, dedicated to preventing suicides, distribute funding could be improved with the development of predictive models for suicide rate. In this study, a multiple linear regression model at a national level was developed to identify relevant factors associated with suicide. The national level model was developed in two phases; the first using response variable data and explanatory variable data from the same time period, and the second with the response variable data shifted one time period to create a more accurate model for prediction. The models had k-fold R-squared values of 0.676 and 0.675. The national model identified four variables to include in a predictive state level model: Foreclosure Rates, Violent Crime Rates, Gini ratio, and Consumption Volume. In the second part of this study, the use of Twitter data in a state level model was evaluated. Tweets terms relating to suicide were identified in fifteen states over a thirty-one-day period and used to calculate three variables: Tweet rate, Favorite rate, and Retweet rate. Each of these three variables for the terms "suicide" and "suicidal" underwent an Analysis of Variance test (ANOVA) to check for differences between states. Each ANOVA test resulted in a p-value less than 0.0001 providing strong evidence that there was a difference in Tweet rate, Favorite rate, and Retweet rate for the two search phrases analyzed among the states. Next, a Pearson Product-Moment correlation coefficient and Pearson Rho correlation coefficient were evaluated for each Twitter variable and the states' historical suicide rates. All computed correlation coefficients were between-0.15 and 0.3 suggesting that there is, at best, a weak correlation between the Twitter variables and a state's historical suicide rate. The results from the Twitter data analysis suggest that it is too early to accurately incorporate such data into a state level multiple linear regression model. The results of this study would help in further development of a state level model that allows organizations, dedicated to reducing suicides, allocate related resources more efficiently.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.