BackgroundTranscriptional regulation in multi-cellular organisms is a complex process involving multiple modular regulatory elements for each gene. Building whole-genome models of transcriptional networks requires mapping all relevant enhancers and then linking them to target genes. Previous methods of enhancer identification based either on sequence information or on epigenetic marks have different limitations stemming from incompleteness of each of these datasets taken separately.ResultsIn this work we present a new approach for discovery of regulatory elements based on the combination of sequence motifs and epigenetic marks measured with ChIP-Seq. Our method uses supervised learning approaches to train a model describing the dependence of enhancer activity on sequence features and histone marks. Our results indicate that using combination of features provides superior results to previous approaches based on either one of the datasets. While histone modifications remain the dominant feature for accurate predictions, the models based on sequence motifs have advantages in their general applicability to different tissues. Additionally, we assess the relevance of different sequence motifs in prediction accuracy showing that even tissue-specific enhancer activity depends on multiple motifs.ConclusionsBased on our results, we conclude that it is worthwhile to include sequence motif data into computational approaches to active enhancer prediction and also that classifiers trained on a specific set ofenhancers can generalize with significant accuracy beyond the training set.
All-relevant feature selection is a relatively new sub-field in the domain of feature selection. The chapter is devoted to a short review of the field and presentation of the representative algorithm. The problem of all-relevant feature selection is first defined, then key algorithms are described. Finally the Boruta algorithm, under development at ICM, University of Warsaw, is explained in a greater detail and applied both to a collection of synthetic and real-world data sets. It is shown that algorithm is both sensitive and selective. The level of falsely discovered relevant variables is low-on average less than one falsely relevant variable is discovered for each set. The sensitivity of the algorithm is nearly 100 % for data sets for which classification is easy, but may be smaller for data sets for which classification is difficult, nevertheless, it is possible to increase the sensitivity of the algorithm at the cost of increased computational effort without adversely affecting the false discovery level. It is achieved by increasing the number of trees in the random forest algorithm that delivers the importance estimate in Boruta.
Herein, we show differences in blood serum of asymptomatic and symptomatic pregnant women infected with COVID-19 and correlate them with laboratory indexes, ATR FTIR and multivariate machine learning methods. We collected the sera of COVID-19 diagnosed pregnant women, in the second trimester (n = 12), third-trimester (n = 7), and second-trimester with severe symptoms (n = 7) compared to the healthy pregnant (n = 11) women, which makes a total of 37 participants. To assign the accuracy of FTIR spectra regions where peak shifts occurred, the Random Forest algorithm, traditional C5.0 single decision tree algorithm and deep neural network approach were used. We verified the correspondence between the FTIR results and the laboratory indexes such as: the count of peripheral blood cells, biochemical parameters, and coagulation indicators of pregnant women. CH
2
scissoring, amide II, amide I vibrations could be used to differentiate the groups. The accuracy calculated by machine learning methods was higher than 90%. We also developed a method based on the dynamics of the absorbance spectra allowing to determine the differences between the spectra of healthy and COVID-19 patients. Laboratory indexes of biochemical parameters associated with COVID-19 validate changes in the total amount of proteins, albumin and lipase.
Stock price prediction is a popular yet challenging task and deep learning provides the means to conduct the mining for the different patterns that trigger its dynamic movement. In this paper, the task is to predict the close price for 25 companies enlisted at the Bucharest Stock Exchange, from a novel data set introduced herein. Towards this scope, two traditional deep learning architectures are designed in comparison: a long short-memory network and a temporal convolutional neural model. Based on their predictions, a trading strategy, whose decision to buy or sell depends on two different thresholds, is proposed. A hill climbing approach selects the optimal values for these parameters. The prediction of the two deep learning representatives used in the subsequent trading strategy leads to distinct facets of gain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.