Cell-free RNA (cfRNA) is a promising analyte for cancer detection. However, a comprehensive assessment of cfRNA in individuals with and without cancer has not been conducted. We perform the first transcriptome-wide characterization of cfRNA in cancer (stage III breast [n = 46], lung [n = 30]) and non-cancer (n = 89) participants from the Circulating Cell-free Genome Atlas (NCT02889978). Of 57,820 annotated genes, 39,564 (68%) are not detected in cfRNA from non-cancer individuals. Within these low-noise regions, we identify tissue- and cancer-specific genes, defined as “dark channel biomarker” (DCB) genes, that are recurrently detected in individuals with cancer. DCB levels in plasma correlate with tumor shedding rate and RNA expression in matched tissue, suggesting that DCBs with high expression in tumor tissue could enhance cancer detection in patients with low levels of circulating tumor DNA. Overall, cfRNA provides a unique opportunity to detect cancer, predict the tumor tissue of origin, and determine the cancer subtype.
Motivation
Cell-free nucleic acid (cfNA) sequencing data require improvements to existing fusion detection methods along multiple axes: high depth of sequencing, low allele fractions, short fragment lengths and specialized barcodes, such as unique molecular identifiers.
Results
AF4 was developed to address these challenges. It uses a novel alignment-free kmer-based method to detect candidate fusion fragments with high sensitivity and orders of magnitude faster than existing tools. Candidate fragments are then filtered using a max-cover criterion that significantly reduces spurious matches while retaining authentic fusion fragments. This efficient first stage reduces the data sufficiently that commonly used criteria can process the remaining information, or sophisticated filtering policies that may not scale to the raw reads can be used. AF4 provides both targeted and de novo fusion detection modes. We demonstrate both modes in benchmark simulated and real RNA-seq data as well as clinical and cell-line cfNA data.
Availability and implementation
AF4 is open sourced, licensed under Apache License 2.0, and is available at: https://github.com/grailbio/bio/tree/master/fusion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.