Individuals (and their family members) share (partial) genomic data on public platforms. However, using special characteristics of genomic data, background knowledge that can be obtained from the Web, and family relationship between the individuals, it is possible to infer the hidden parts of shared (and unshared) genomes. Existing work in this field considers simple correlations in the genome (as well as Mendel's law and partial genomes of a victim and his family members). In this paper, we improve the existing work on inference attacks on genomic privacy. We mainly consider complex correlations in the genome by using an observable Markov model and recombination model between the haplotypes. We also utilize the phenotype information about the victims. We propose an efficient message passing algorithm to consider all aforementioned background information for the inference. We show that the proposed framework improves inference with significantly less information compared to existing work.
Motivation Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target-specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of the reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase-specific predictions, yet for a large body of kinases, only a few or no target sites are reported. Results We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase-specific positional amino acid preferences are learned using a bidirectional recurrent neural network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model and other methods available. By expanding our knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas. Availability and implementation The source codes are available at https://github.com/Tastanlab/DeepKinZero. Supplementary information Supplementary data are available at Bioinformatics online.
The COVID-19 pandemic has significantly impacted academic life in the United States and beyond. To gain a better understanding of its impact on the academic community, we conducted a large-scale survey at the University of Massachusetts Amherst. We collected multifaceted data from students, staff, and faculty on several aspects of their lives, such as mental and physical health, productivity, and finances. All our respondents expressed mental and physical issues and concerns, such as increased stress and depression levels. Financial difficulties seem to have the most considerable toll on staff and undergraduate students, while productivity challenges were mostly expressed by faculty and graduate students. As universities face many important decisions with respect to mitigating the effects of this pandemic, we present our findings with the intent of shedding light on the challenges faced by various academic groups in the face of the pandemic, calling attention to the differences between groups. We also contribute a discussion highlighting how the results translate to policies for the effective and timely support of the categories of respondents who need them most. Finally, the survey itself, which includes conditional logic allowing for personalized questions, serves as a template for further data collection, facilitating a comparison of the impact on campuses across the United States.
5Protein phosphorylation is a key regulator of protein function in signal transduction pathways. Kinases are the enzymes that catalyze the phosphorylation of other proteins in a target specific manner. The dysregulation of phosphorylation is associated with many diseases including cancer. Although the advances in phosphoproteomics enable the identification of phosphosites at the proteome level, most of the phosphoproteome is still in the dark: more than 95% of 10 reported human phosphosites have no known kinases. Determining which kinase is responsible for phosphorylating a site remains an experimental challenge. Existing computational methods require several examples of known targets of a kinase to make accurate kinase specific predictions, yet for a large body of kinases, only a few or no target sites are reported. We present DeepKinZero, the first zero-shot learning approach to predict the kinase acting on a phosphosite for kinases with 15 no known phosphosite information. DeepKinZero transfers knowledge from kinases with many known target phosphosites to those kinases with no known sites through a zero-shot learning model. The kinase specific positional amino acid preferences are learned using a bidirectional recurrent network. We show that DeepKinZero achieves significant improvement in accuracy for kinases with no known phosphosites in comparison to the baseline model. By expanding our 20 knowledge on understudied kinases, DeepKinZero can help to chart the phosphoproteome atlas.
In intensive care units (ICUs), patient health is monitored through (1) continuous vital signals from various medical devices, and (2) clinical notes consisting of opinions and summaries from doctors which are recorded in electronic health records (EHR). It is difficult to jointly model these two sources of information because clinical notes, unlike vital signals, are collected at irregular intervals and their contents are relatively unstructured. In this paper, we present a model that combines both sources of information about ICU patients to make accurate in-hospital mortality predictions. We apply a fine-tuned BERT model to each of the patient's clinical notes. The resulting embeddings are then combined to obtain the overall embedding for the entire text part of the data. This is then combined with the output of an LSTM model that encodes patients' vital signals. Our model improves upon the state of the art for mortality prediction, attaining an AUC score of 0.9, compared to the previous 0.87, setting a new standard for mortality prediction on the MIMIC III benchmark. 1
Home‐range estimates are a common product of animal tracking data, as each range represents the area needed by a given individual. Population‐level inference of home‐range areas—where multiple individual home ranges are considered to be sampled from a population—is also important to evaluate changes over time, space or covariates such as habitat quality or fragmentation, and for comparative analyses of species averages. Population‐level home‐range parameters have traditionally been estimated by first assuming that the input tracking data were sampled independently when calculating home ranges via conventional kernel density estimation (KDE) or minimal convex polygon (MCP) methods, and then assuming that those individual home ranges were measured exactly when calculating the population‐level estimates. This conventional approach does not account for the temporal autocorrelation that is inherent in modern tracking data, nor for the uncertainties of each individual home‐range estimate, which are often large and heterogeneous. Here, we introduce a statistically and computationally efficient framework for the population‐level analysis of home‐range areas, based on autocorrelated kernel density estimation (AKDE), that can account for variable temporal autocorrelation and estimation uncertainty. We apply our method to empirical examples on lowland tapir Tapirus terrestris, kinkajou Potos flavus, white‐nosed coati Nasua narica, white‐faced capuchin monkey Cebus capucinus and spider monkey Ateles geoffroyi, and quantify differences between species, environments and sexes. Our approach allows researchers to more accurately compare different populations with different movement behaviours or sampling schedules while retaining statistical precision and power when individual home‐range uncertainties vary. Finally, we emphasize the estimation of effect sizes when comparing populations, rather than mere significance tests.
· Home-range estimates are a common product of animal tracking data, as each range informs on the area needed by a given individual. Population-level inference on home-range areas—where multiple individual home-ranges are considered to be sampled from a population—is also important to evaluate changes over time, space, or covariates, such as habitat quality or fragmentation, and for comparative analyses of species averages. Population-level home-range parameters have traditionally been estimated by first assuming that the input tracking data were sampled independently when calculating home ranges via conventional kernel density estimation (KDE) or minimal convex polygon (MCP) methods, and then assuming that those individual home ranges were measured exactly when calculating the population-level estimates. This conventional approach does not account for the temporal autocorrelation that is inherent in modern tracking data, nor for the uncertainties of each individual home-range estimate, which are often large and heterogeneous. · Here, we introduce a statistically and computationally efficient framework for the population-level analysis of home-range areas, based on autocorrelated kernel density estimation (AKDE), that can account for variable temporal autocorrelation and estimation uncertainty. · We apply our method to empirical examples on lowland tapir (Tapirus terrestris), kinkajou (Potos flavus), white‐nosed coati (Nasua narica), white-faced capuchin monkey (Cebus capucinus), and spider monkey (Ateles geoffroyi), and quantify differences between species, environments, and sexes. · Our approach allows researchers to more accurately compare different populations with different movement behaviors or sampling schedules, while retaining statistical precision and power when individual home-range uncertainties vary. Finally, we emphasize the estimation of effect sizes when comparing populations, rather than mere significance tests.
Missing values, irregularly collected samples, and multi-resolution signals commonly occur in multivariate time series data, making predictive tasks difficult. These challenges are especially prevalent in the healthcare domain, where patients vital signs and electronic records are collected at different frequencies and have occasionally missing information due to the imperfections in equipment or patient circumstances. Researchers have handled each of these issues differently, often handling missing data through mean value imputation and then using sequence models over the multivariate signals while ignoring the different resolution of signals. We propose a unified model named Multi-resolution Flexible Irregular Time series Network (Multi-FIT). The building block for Multi-FIT is the FIT network. The FIT network creates an informative dense representation at each time step using signal information such as last observed value, time difference since the last observed time stamp and overall mean for the signal. Vertical FIT (FIT-V) is a variant of FIT which also models the relationship between different temporal signals while creating the informative dense representations for the signal. The multi-FIT model uses multiple FIT networks for sets of signals with different resolutions, further facilitating the construction of flexible representations. Our model has three main contributions: a.) it does not impute values but rather creates informative representations to provide flexibility to the model for creating taskspecific representations b.) it models the relationship between different signals in the form of support signals c.) it models different resolutions in parallel before merging them for the final prediction task. The FIT, FIT-V and Multi-FIT networks improve upon the state-of-the-art models for three predictive tasks, including the forecasting of patient survival. * Equal contribution, randomly ordered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.