Promoter prediction is an important and complex problem. Pattern recognition algorithms typically require features that could capture this complexity. A special bias towards certain combinations of base pairs in the promoter sequences may be possible. In order to determine these biases n-grams are usually extracted and analyzed. An n-gram is a selection of n contiguous characters from a given character stream, DNA sequence segments in this case. Here a systematic study is made to discover the efficacy of n-grams for n = 2, 3, 4, 5 in promoter prediction. A study of n-grams as features for a neural network classifier for E. coli and Drosophila promoters is made. In case of E. coli n = 3 and in case of Drosophila n = 4 seem to give optimal prediction values. Using the 3-gram features, promoter prediction in the genome sequence of E. coli is done. The results are encouraging in positive identification of promoters in the genome compared to software packages such as BPROM, NNPP, and SAK. Whole genome promoter prediction in Drosophila genome was also performed but with 4-gram features. -IOS Press and Bioinformation Systems e.V. and the authors. All rights reserved S2
T.S. Rani and R.S. Bapi / Analysis of n-Gram based Promoter Recognition Methodsas criteria for identifying the promoter [2]. This algorithm uses the set of IUPAC words extracted from the training data set. They have tried their method on eukaryotic RNA polymerase II promoters. They obtained a true positive rate of 43% and a ratio of true positive to false positive as 2.3, which is very high compared to the best such ratio reported by the other tools. Ramana et al. have also tried to identify the promoters as well as first exons of human species by using an algorithm called FirstEF which is based upon the usage of structural and compositional features [3]. They were able to predict 86% of the first exons. They have compared their method with PromoterInspector and obtained a sensitivity of 70% compared to PromoterInspector's 48%. Bajic et al. termed that the prediction is positive if the predicted transcription start site (TSS) falls within a maximum allowed distance from the reference transcription start site [5]. They have assessed performance of some of the prediction algorithms based on the performance measures such as sensitivity and positive predictive value. In their later paper they concluded that the promoter prediction combined with gene prediction yields a better recognition rate [6]. In this paper they have reported the predictions performed on ENCODE Genome Annotation Assessment Project (EGASP) experiments. In these experiments, sensitivity is found to vary from 32% to 58% and positive predictive value in the range of 79% to 93% for the ENCODE regions of human genome. These results show that promoter prediction is not a trivial task and prediction rates are not very high. They have found that the reduced promoter search space results in a smaller number of false positive predictions and improves the results. Hence, it can be said that promoter ...
The spread of a disease caused by a virus can happen through human to human contact or could be from the environment. A mathematical model could be used to capture the dynamics of the disease spread to estimate the infections, recoveries, and fatalities that may result from the disease. An estimation is crucial to make policy decisions and for the alerts for the medical emergencies that may arise. Many epidemiological models are being used to make such an estimation. One major factor that is important in the forecasts using the models is the dynamic nature of the disease spread. Unless we can come up with a way of estimating the parameters that guide this dynamic spread, the models may not give accurate forecasts. The main principle is to keep the model generic while making minimal assumptions. In this work, we have derived a data-driven model from SEIRD, where we attempt to forecast Infected, Recovered and Deceased rates of COVID-19 up to a week. A method of estimating the parameters of the model is also discussed thoroughly in this work. The model is tested for India at a district level along with the most affected foreign cities like Lombardia from Italy and Moscow from Russia.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.