Precise splice junction calls are currently unavailable in scRNA-seq pipelines such as the 10x Chromium platform but are critical for understanding single-cell biology.Here, we introduce SICILIAN, a new method that assigns statistical confidence to splice junctions from a spliced aligner to improve precision. SICILIAN's precise splice detection achieves high accuracy on simulated data, improves concordance between matched single-cell and bulk datasets, increases agreement between biological replicates, and reliably detects un-annotated splicing in single cells, enabling the discovery of novel splicing regulation.
Main text:Alternative splicing is essential for the specialized functions of eukaryotic cells, necessary for development 1 , and a greater contributor to genetic disease burden than mutations 2 . Despite the importance of splicing and massive RNA-seq data generated on single cells, the extent to which the diversity of RNA splicing in single cells is regulated and functional versus transcriptional noise remains contentious 3 .Given the resolution and massive number of available single-cell RNA-seq (scRNA-seq) datasets, precise quantification of splicing in single cells has great promise for discovering regulatory and functional splicing biology. While there are a number of methods developed for isoform quantification at the single-cell level 4,5 , a problem that has not been addressed is the precise discovery of splice junctions. There is a great need for precise junction detection: spliced aligners are designed for bulk RNA-seq and, in addition, generate many artifacts 6-10 , which will be referred to in this paper as "false positives". The problem is exacerbated in scRNA-seq analysis due to the high-level and single-cell-specific biochemical noise and multiple testing errors arising from the analysis of thousands of cells. This problem is typically addressed through the use of simple filters on junction calls, which remove many true positives, especially when the data is sparse such as scRNA-seq. Because of these challenges, there is debate