MotivationThe prediction of off-target mutations in CRISPR-Cas9 is a hot topic due to its relevance to gene editing research. Existing prediction methods have been developed; however, most of them just calculated scores based on mismatches to the guide sequence in CRISPR-Cas9. Therefore, the existing prediction methods are unable to scale and improve their performance with the rapid expansion of experimental data in CRISPR-Cas9. Moreover, the existing methods still cannot satisfy enough precision in off-target predictions for gene editing at the clinical level.ResultsTo address it, we design and implement two algorithms using deep neural networks to predict off-target mutations in CRISPR-Cas9 gene editing (i.e. deep convolutional neural network and deep feedforward neural network). The models were trained and tested on the recently released off-target dataset, CRISPOR dataset, for performance benchmark. Another off-target dataset identified by GUIDE-seq was adopted for additional evaluation. We demonstrate that convolutional neural network achieves the best performance on CRISPOR dataset, yielding an average classification area under the ROC curve (AUC) of 97.2% under stratified 5-fold cross-validation. Interestingly, the deep feedforward neural network can also be competitive at the average AUC of 97.0% under the same setting. We compare the two deep neural network models with the state-of-the-art off-target prediction methods (i.e. CFD, MIT, CROP-IT, and CCTop) and three traditional machine learning models (i.e. random forest, gradient boosting trees, and logistic regression) on both datasets in terms of AUC values, demonstrating the competitive edges of the proposed algorithms. Additional analyses are conducted to investigate the underlying reasons from different perspectives.Availability and implementationThe example code are available at https://github.com/MichaelLinn/off_target_prediction. The related datasets are available at https://github.com/MichaelLinn/off_target_prediction/tree/master/data.
The off‐target effects induced by guide RNAs in the CRISPR/Cas9 gene‐editing system have raised substantial concerns in recent years. Many in silico predictive models have been developed for predicting the off‐target activities; however, few are capable of predicting the off‐target activities with insertions or deletions between guide RNA and target DNA sequence pair. In order to fill this gap, a recurrent convolutional network named CRISPR‐Net is developed for scoring the gRNA‐target pairs with mismatches and indels; and a machine‐learning based model named CRISPR‐Net‐Aggregate is also developed for aggregating the scores as the consensus off‐target score for each potential guide RNA. It is demonstrated that CRISPR‐Net achieves competitive performance on CIRCLE‐Seq and GUIDE‐seq datasets with indels and mismatches, outperforming the state‐of‐the‐art off‐target prediction methods on two independent mismatch‐only datasets. The CRISPR‐Net‐Aggregate also surpasses a competing method on the aggregation task. Moreover, a two‐stage sensitivity analysis is introduced to visualize the CRISPR‐Net prediction on the gRNA‐target pair of interest, demonstrating how implicit knowledge encoded in CRISPR‐Net contributes to the accurate off‐target activity quantification. Finally, the source code is made available at the Code Ocean repository (https://codeocean.com/capsule/9553651/tree/v1).
Summary The early detection of cancers has the potential to save many lives. A recent attempt has been demonstrated successful. However, we note several critical limitations. Given the central importance and broad impact of early cancer detection, we aspire to address those limitations. We explore different supervised learning approaches for multiple cancer type detection and observe significant improvements; for instance, one of our approaches (i.e., CancerA1DE) can double the existing sensitivity from 38% to 77% for the earliest cancer detection (i.e., Stage I) at the 99% specificity level. For Stage II, it can even reach up to about 90% across multiple cancer types. In addition, CancerA1DE can also double the existing sensitivity from 30% to 70% for detecting breast cancers at the 99% specificity level. Data and model analysis are conducted to reveal the underlying reasons. A website is built at http://cancer.cs.cityu.edu.hk/ .
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.