LegNet: a best-in-class deep learning model for short DNA regulatory regions

Penzar, Dmitry; Nogina, Daria; Noskova, Elizaveta; Zinkevich, Arsenii; Meshcheryakov, G. A.; Lando, Andrey; Rafi, Abdul Muntakim; Boer, Carl G. de; Kulakovskiy, Ivan V.

doi:10.1093/bioinformatics/btad457

Cited by 9 publications

(6 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The following architectural choices were used in the final model: (i) grouped convolution (59) instead of the depthwise convolution of the original EfficientNetV2, (ii) the standard residual blocks were substituted with residual channel-wise concatenations, (iii) a bilinear layer was inserted in the middle of the EfficientNetV2 SE-block. A detailed study on this architecture is presented in (60).…”

Section: Methodsmentioning

confidence: 99%

“…It contains modifications like replacing depthwise convolution with grouped convolution, using Squeeze and Excitation (SE) blocks (67), and adopting channel-wise concatenation for residual connections. The channel configuration starts with 256 channels for the initial block, followed by 128, 128, 64, 64, 64, and 64 channels (60).…”

Section: Prix Fixe Net: (I) Dream-cnnmentioning

confidence: 99%

See 1 more Smart Citation

Evaluation and optimization of sequence-based gene regulatory deep learning models

Rafi

Penzar

Nogina

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

Neural networks have proven to be an immensely powerful tool in predicting functional genomic regions, in particular with many recent successes in deciphering gene regulatory logic. However, how model architecture and training strategy choices affect model performance has not been systematically evaluated for genomics models. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding experimentally determined expression levels to best capture the relationship between regulatory DNA and gene expression in yeast. To robustly evaluate the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. While some benchmarks produced similar results across all models, others differed substantially. For some sequence types, model performances exhibited correlation scores as high as 0.98, while for others, substantial improvement is still required. The top-performing models were all neural networks, which demonstrated substantial performance gains by customizing model architectures to the nature of the experiment and utilizing novel training strategies tailored to genomics sequence data. Overall, our DREAM Challenge highlights the need to benchmark genomics models across different scenarios to uncover their limitations.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Prix Fixe Net: (I) Dream-cnnmentioning

confidence: 99%

Evaluation and optimization of sequence-based gene regulatory deep learning models

Rafi

Penzar

Nogina

et al. 2023

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Despite a recent push to favor large‐scale attention transformer models in this field, some researchers have argued that despite excellent performance in protein structure prediction, text mining, and genomic data analysis, the quality of transformer models can be overestimated under certain test scenarios. [ 194,195 ] Concerns also persist regarding their ability to effectively capture long‐range interactions. [ 194 ]…”

Section: Applying Machine Learning Techniques To Decipher the Cis‐reg...mentioning

confidence: 99%

“…A relevant development has been LegNet, a CNN for modeling short gene regulatory regions that achieved first rank in predicting promoter expression from a gigantic parallel reporter assay at the DREAM 2022 challenge. [ 195 ] The authors highlight that fully convolutional networks should be recognized as a dependable method for computationally modeling short gene regulatory regions and predicting the consequences of regulatory sequence modifications. However, ultimately, it is critical to remember that the effectiveness of machine learning and AI models hinges on the quality of experimental data, with current limitations in wet lab techniques contributing to challenges in precisely defining enhancers across the genome and occasionally leading to poor reproducibility even in replicates of the same experiment.…”

Section: Applying Machine Learning Techniques To Decipher the Cis‐reg...mentioning

confidence: 99%

Advances in computational and experimental approaches for deciphering transcriptional regulatory networks

Moeckel,

Mouratidis,

Chantzi

et al. 2024

BioEssays

View full text Add to dashboard Cite

Understanding the influence of cis‐regulatory elements on gene regulation poses numerous challenges given complexities stemming from variations in transcription factor (TF) binding, chromatin accessibility, structural constraints, and cell‐type differences. This review discusses the role of gene regulatory networks in enhancing understanding of transcriptional regulation and covers construction methods ranging from expression‐based approaches to supervised machine learning. Additionally, key experimental methods, including MPRAs and CRISPR‐Cas9‐based screening, which have significantly contributed to understanding TF binding preferences and cis‐regulatory element functions, are explored. Lastly, the potential of machine learning and artificial intelligence to unravel cis‐regulatory logic is analyzed. These computational advances have far‐reaching implications for precision medicine, therapeutic target discovery, and the study of genetic variations in health and disease.

show abstract

“…In addition, while many of these models accurately predict TF binding and accessible chromatin in the genome, they are trained on indirect proxies of cis-regulation, and therefore less accurately predict cis-regulatory activity. Models trained on massively parallel reporter gene assays (MPRAs) 19,[32][33][34][35][36][37] do predict cisregulatory activity directly. Still, MPRA studies must also contend with the same fundamental limitation: the number of genomic training examples in any particular cell type is small relative to the scale of the training data typically needed to model the interactions defining cis-regulatory grammars 38 .…”

Section: Introductionmentioning

confidence: 99%

Active learning of enhancer and silencer regulatory grammar in photoreceptors

Friedman,

Ramu,

Lichtarge

et al. 2023

Preprint

View full text Add to dashboard Cite

Cis-regulatory elements (CREs) direct gene expression in health and disease, and models that can accurately predict their activities from DNA sequences are crucial for biomedicine. Deep learning represents one emerging strategy to model the regulatory grammar that relates CRE sequence to function. However, these models require training data on a scale that exceeds the number of CREs in the genome. We address this problem using active machine learning to iteratively train models on multiple rounds of synthetic DNA sequences assayed in live mammalian retinas. During each round of training the model actively selects sequence perturbations to assay, thereby efficiently generating informative training data. We iteratively trained a model that predicts the activities of sequences containing binding motifs for the photoreceptor transcription factor Cone-rod homeobox (CRX) using an order of magnitude less training data than current approaches. The model’s internal confidence estimates of its predictions are reliable guides for designing sequences with high activity. The model correctly identified critical sequence differences between active and inactive sequences with nearly identical transcription factor binding sites, and revealed order and spacing preferences for combinations of motifs. Our results establish active learning as an effective method to train accurate deep learning models ofcis-regulatory function after exhausting naturally occurring training examples in the genome.

show abstract

LegNet: a best-in-class deep learning model for short DNA regulatory regions

Cited by 9 publications

References 19 publications

Evaluation and optimization of sequence-based gene regulatory deep learning models

Evaluation and optimization of sequence-based gene regulatory deep learning models

Advances in computational and experimental approaches for deciphering transcriptional regulatory networks

Active learning of enhancer and silencer regulatory grammar in photoreceptors

Contact Info

Product

Resources

About