“…We analyzed data from patients suffering from rare diseases that manifest in specific tissues. Patients were previously genetically diagnosed via exome sequencing and subsequent analysis as previously described 70–73 . The data per patient was de-identified, and variants were filtered as follows: - kept variants with call quality at least 20.0 in cases or at least 20.0 in controls AND outside top 5.0% most exonically variable 100base windows in healthy public genomes (1000 genomes).
- excluded variants that were observed with an allele frequency greater than or equal to 0.5% of the genomes in the 1000 genomes project OR greater than or equal to 0.5% of the NHLBI ESP exomes (All); or greater than or equal to 0.5% of the ExAC Frequency; or greater than or equal to 0.5% of the gnomAD Frequency; or filter variants unless established pathogenic common variant.
- kept variants (up to 20 bases into intron) that were experimentally observed to be associated with a phenotype: Pathogenic, possibly Pathogenic or disease-associated according to HGMD; or clinically relevant variants from CentoMD; or frameshift, in-frame indel, or stop codon change, or missense, or predicted deleterious by having CADD score > 15.0; or predicted to disrupt splicing by MaxEntScan; or within 2 bases into intron.
- In case of dominant genes, kept variants which are associated with gain of function, or hemizygous, or heterozygous, or heterozygous-amb, or compound heterozygous, or homozygous, or heterozygous-alt, or haploinsufficient and occur in at least one of the Case samples at the variant level; and not variants which are associated with gain of function, or hemizygous, or heterozygous, or heterozygous-amb, or compound heterozygous, or homozygous, or heterozygous-alt, or haploinsufficient, and occur in at least one of the control samples at the variant level in the Control samples.
…”