Rules extraction from neural networks applied to the prediction and recognition of prokaryotic promoters

Silva, Scheila de Ávila e; Gerhardt, Günther J.L.; Echeverrigaray, Sérgio

doi:10.1590/s1415-47572011000200031

Cited by 13 publications

(6 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among these methods, several machine learning algorithms have been used in developing prokaryotic promoter region prediction methods. For example, support vector machine (SVM) [ 7 , 8 , 12 , 25 , 26 ], artificial neural networks (ANNs) [ 16 , 17 , 20 , 27 – 29 ], partial least square (PLS) [ 18 ], and quadratic discriminant analysis (QDS) [ 14 ]. Some methods are based on probabilistic approaches (e.g., hidden Markov models (HMMs) [ 30 ] and a combination of HMMs and ANNs [ 31 ]).…”

Section: Introductionmentioning

confidence: 99%

“…Therefore, developers have to generate their non-promoter sequences. Several strategies for generating non-promoter sequences have been used including: randomly generated sequences [ 16 , 17 , 28 ]; sequences extracted from intergenic or coding regions [ 7 , 11 , 12 , 14 , 15 , 18 , 25 , 28 ]; ii) Feature extraction: Several sequence and structure-based feature representations have been used for developing prokaryotic promoter region prediction methods. Examples of sequence based features include: k-mer representation [ 7 , 12 , 28 , 32 ], variable-window Z-curve [ 18 ], and nucleotide identity (NID) [ 17 ].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Assessing the Effects of Data Selection and Representation on the Development of Reliable E. coli Sigma 70 Promoter Region Predictors

2015

View full text Add to dashboard Cite

As the number of sequenced bacterial genomes increases, the need for rapid and reliable tools for the annotation of functional elements (e.g., transcriptional regulatory elements) becomes more desirable. Promoters are the key regulatory elements, which recruit the transcriptional machinery through binding to a variety of regulatory proteins (known as sigma factors). The identification of the promoter regions is very challenging because these regions do not adhere to specific sequence patterns or motifs and are difficult to determine experimentally. Machine learning represents a promising and cost-effective approach for computational identification of prokaryotic promoter regions. However, the quality of the predictors depends on several factors including: i) training data; ii) data representation; iii) classification algorithms; iv) evaluation procedures. In this work, we create several variants of E. coli promoter data sets and utilize them to experimentally examine the effect of these factors on the predictive performance of E. coli σ 70 promoter models. Our results suggest that under some combinations of the first three criteria, a prediction model might perform very well on cross-validation experiments while its performance on independent test data is drastically very poor. This emphasizes the importance of evaluating promoter region predictors using independent test data, which corrects for the over-optimistic performance that might be estimated using the cross-validation procedure. Our analysis of the tested models shows that good prediction models often perform well despite how the non-promoter data was obtained. On the other hand, poor prediction models seems to be more sensitive to the choice of non-promoter sequences. Interestingly, the best performing sequence-based classifiers outperform the best performing structure-based classifiers on both cross-validation and independent test performance evaluation experiments. Finally, we propose a meta-predictor method combining two top performing sequence-based and structure-based classifiers and compare its performance with some of the state-of-the-art E. coli σ 70 promoter prediction methods.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Assessing the Effects of Data Selection and Representation on the Development of Reliable E. coli Sigma 70 Promoter Region Predictors

2015

View full text Add to dashboard Cite

show abstract

“…Previous studies have confirmed that certain promoters can be identified or predicted based on ANN method [21], [22], [23], [24], [25], [26], [27], [28], but no further effort was reported for quantitative description of their strength. Here, we constructed a finely characterzied Trc promoter & RBS library for sufficient model training and greatly improved the prediction accuracy compared with previous reported methods (PLS-, PWM- and thermodynamics-based) [10], [11], [12].…”

Section: Discussionmentioning

confidence: 99%

“…It can be adapted to continuously change the network structure based on input/output information during learning phase, which could reflect the non-linear relationships between quantitative characteristics and related qualitative performance in complex phenomena. Thus, ANNs have been widely used to various biological research fields such as protein structure and stability prediction [17], [18], [19], RNA secondary structure prediction [20], as well as promoter recognition and structure analysis [21], [22], [23], [24], [25], [26], [27], [28]. In this work, we constructed a high-performance ANN model to directly predict the strength of regulatory element from its sequence.…”

Section: Introductionmentioning

confidence: 99%

Quantitative Design of Regulatory Elements Based on High-Precision Strength Prediction Using Artificial Neural Network

et al. 2013

View full text Add to dashboard Cite

Accurate and controllable regulatory elements such as promoters and ribosome binding sites (RBSs) are indispensable tools to quantitatively regulate gene expression for rational pathway engineering. Therefore, de novo designing regulatory elements is brought back to the forefront of synthetic biology research. Here we developed a quantitative design method for regulatory elements based on strength prediction using artificial neural network (ANN). One hundred mutated Trc promoter & RBS sequences, which were finely characterized with a strength distribution from 0 to 3.559 (relative to the strength of the original sequence which was defined as 1), were used for model training and test. A precise strength prediction model, NET90_19_576, was finally constructed with high regression correlation coefficients of 0.98 for both model training and test. Sixteen artificial elements were in silico designed using this model. All of them were proved to have good consistency between the measured strength and our desired strength. The functional reliability of the designed elements was validated in two different genetic contexts. The designed parts were successfully utilized to improve the expression of BmK1 peptide toxin and fine-tune deoxy-xylulose phosphate pathway in Escherichia coli. Our results demonstrate that the methodology based on ANN model can de novo and quantitatively design regulatory elements with desired strengths, which are of great importance for synthetic biology applications.

show abstract

“…To overcome this obstacle, several researchers have directly translated the nucleotides in promoter sequences into digits, resulting in digital vectors that resemble the DNA sequences. Different approaches have been adopted to accommodate the variable distances between motifs, including initial sequence alignment 20 and coupling SVM with a sequence alignment kernel to affine gaps in the input sequences 7 . In some studies, the DNA sequences were broken down into collections of oligomers tagged with information on their locations relative to TSSs 6 , 21 .…”

Section: Introductionmentioning

confidence: 99%

Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns

Wang

Cheng

et al. 2018

Sci Rep

View full text Add to dashboard Cite

Prediction of promoter regions is crucial for studying gene function and regulation. The well-accepted position weight matrix method for this purpose relies on predefined motifs, which would hinder application across different species. Here, we introduce image-based promoter prediction (IBPP) as a method that creates an “image” from training promoter sequences using an evolutionary approach and predicts promoters by matching with the “image”. We used Escherichia coli σ70 promoter sequences to test the performance of IBPP and the combination of IBPP and a support vector machine algorithm (IBPP-SVM). The “images” generated with IBPP could effectively distinguish promoter from non-promoter sequences. Compared with IBPP, IBPP-SVM showed a substantial improvement in sensitivity. Furthermore, both methods showed good performance for sequences of up to 2,000 nt in length. The performances of IBPP and IBPP-SVM were largely affected by the threshold and dimension of vectors, respectively. The source code and documentation are freely available at https://github.com/hahatcdg/IBPP.

show abstract

Rules extraction from neural networks applied to the prediction and recognition of prokaryotic promoters

Cited by 13 publications

References 18 publications

Assessing the Effects of Data Selection and Representation on the Development of Reliable E. coli Sigma 70 Promoter Region Predictors

Assessing the Effects of Data Selection and Representation on the Development of Reliable E. coli Sigma 70 Promoter Region Predictors

Quantitative Design of Regulatory Elements Based on High-Precision Strength Prediction Using Artificial Neural Network

Image-based promoter prediction: a promoter prediction method based on evolutionarily generated patterns

Contact Info

Product

Resources

About