This investigation developed models to estimate aspects of physical activity and sedentary behavior from three-axis high-frequency wrist-worn accelerometer data. The models were developed and tested on 20 participants (n = 10 males, n = 10 females, mean age = 24.1, mean body mass index = 23.9), who wore an ActiGraph GT3X+ accelerometer on their dominant wrist and an ActiGraph GT3X on the hip while performing a variety of scripted activities. Energy expenditure was concurrently measured by a portable indirect calorimetry system. Those calibration data were then used to develop and assess both machine-learning and simpler models with fewer unknown parameters (linear regression and decision trees) to estimate metabolic equivalent scores (METs) and to classify activity intensity, sedentary time, and locomotion time. The wrist models, applied to 15-s windows, estimated METs [random forest: root mean squared error (rSME) = 1.21 METs, hip: rMSE = 1.67 METs] and activity intensity (random forest: 75% correct, hip: 60% correct) better than a previously developed model that used counts per minute measured at the hip. In a separate set of comparisons, the simpler decision trees classified activity intensity (random forest: 75% correct, tree: 74% correct), sedentary time (random forest: 96% correct, decision tree: 97% correct), and locomotion time (random forest: 99% correct, decision tree: 96% correct) nearly as well or better than the machine-learning approaches. Preliminary investigation of the models' performance on two free-living people suggests that they may work well outside of controlled conditions.
The understanding of bacterial gene function has been greatly enhanced by recent advancements in the deep sequencing of microbial genomes. Transposon insertion sequencing methods combines next-generation sequencing techniques with transposon mutagenesis for the exploration of the essentiality of genes under different environmental conditions. We propose a model-based method that uses regularized negative binomial regression to estimate the change in transposon insertions attributable to gene-environment changes in this genetic interaction study without transformations or uniform normalization. An empirical Bayes model for estimating the local false discovery rate combines unique and total count information to test for genes that show a statistically significant change in transposon counts. When applied to RB-TnSeq (randomized barcode transposon sequencing) and Tn-seq (transposon sequencing) libraries made in strains of Caulobacter crescentus using both total and unique count data the model was able to identify a set of conditionally beneficial or conditionally detrimental genes for each target condition that shed light on their functions and roles during various stress conditions.
Background: Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.
Results:We have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.
Conclusions:The DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.
Background: Recently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision. Results: We have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools. Conclusions: The DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.