Several attempts have been made at systematically mapping protein-protein interaction, or “interactome” networks. However, it remains difficult to assess the quality and coverage of existing datasets. We describe a framework that uses an empirically-based approach to rigorously dissect quality parameters of currently available human interactome maps. Our results indicate that high-throughput yeast two-hybrid (HT-Y2H) interactions for human are superior in precision to literature-curated interactions supported by only a single publication, suggesting that HT-Y2H is suitable to map a significant portion of the human interactome. We estimate that the human interactome contains ~130,000 binary interactions, most of which remain to be mapped. Similar to estimates of DNA sequence data quality and genome size early in the human genome project, estimates of protein interaction data quality and interactome size are critical to establish the magnitude of the task of comprehensive human interactome mapping and to illuminate a path towards this goal.
Cellular functions are mediated through complex systems of macromolecules and metabolites linked through biochemical and physical interactions, represented in interactome models as 'nodes' and 'edges', respectively. Better understanding of genotype-to-phenotype relationships in human disease will require modeling of how disease-causing mutations affect systems or interactome properties. Here we investigate how perturbations of interactome networks may differ between complete loss of gene products ('node removal') and interaction-specific or edge-specific ('edgetic') alterations. Global computational analyses of B50 000 known causative mutations in human Mendelian disorders revealed clear separations of mutations probably corresponding to those of node removal versus edgetic perturbations. Experimental characterization of mutant alleles in various disorders identified diverse edgetic interaction profiles of mutant proteins, which correlated with distinct structural properties of disease proteins and disease mechanisms. Edgetic perturbations seem to confer distinct functional consequences from node removal because a large fraction of cases in which a single gene is linked to multiple disorders can be modeled by distinguishing edgetic network perturbations. Edgetic network perturbation models might improve both the understanding of dissemination of disease alleles in human populations and the development of molecular therapeutic strategies.
Targeted next-generation sequencing panels to identify genetic alterations in cancers are increasingly becoming an integral part of clinical practice. We report here the design, validation, and implementation of a comprehensive 95-gene next-generation sequencing panel targeted for hematologic malignancies that we named rapid heme panel. Rapid heme panel is amplicon based and covers hotspot regions of oncogenes and most of the coding regions of tumor suppressor genes. It is composed of 1330 amplicons and covers 175 kb of genomic sequence in total. Rapid heme panel's average coverage is 1500× with <5% of the amplicons with <50× coverage, and it reproducibly detects single nucleotide variants and small insertions/deletions at allele frequencies of ≥5%. Comparison with a capture-based next-generation sequencing assay showed that there is >95% concordance among a wide array of variants across a range of allele frequencies. Read count analyses that used rapid heme panel showed high concordance with karyotypic results when tumor content was >30%. The average turnaround time was 7 days over a 6-month span with an average volume of ≥40 specimens per week and a low sample fail rate (<1%), demonstrating its suitability for clinical application.
RACE (Rapid Amplification of cDNA Ends) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. Here, we describe a strategy that uses array hybridization to improve sampling efficiency of human transcripts. The products of the RACE reaction are hybridized onto tiling arrays, and the exons detected are used to delineate a series of RT-PCR reactions, through which the original RACE mixture is segregated into simpler RT-PCR reactions. These are independently cloned, and randomly selected clones are sequenced. This approach is superior to direct cloning and sequencing of RACE products: it specifically targets novel transcripts, and often results in overall normalization of transcript abundances. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of novel transcripts, and we investigate multiplexing it by pooling RACE reactions from multiple interrogated loci prior to hybridization.
Context.-Pulmonary large cell carcinoma (LCC) includes tumors not readily diagnosed as adenocarcinoma (ADC) or squamous cell carcinoma on morphologic grounds, without regard to immunophenotype, according to the World Health Organization (WHO). This ambiguous designation may cause confusion over selection of mutation testing and directed therapies. Several groups have proposed the use of immunohistochemistry (IHC) to recategorize LCC as ADC or squamous cell carcinoma; however, it remains unclear if strictly defined LCCs are a clinicopathologically distinct lung tumor subset.Objective.-To compare the pathologic, molecular, and clinical features of 2 morphologically similar tumors: solidsubtype ADC and LCC.Design.-Tumors were included on the basis of solid growth pattern; tumors with squamous or neuroendocrine differentiation were excluded. Solid ADC (n ¼ 42) and LCC (n ¼ 57) were diagnosed by using WHO criteria (5 intracellular mucin droplets in !2 high-power fields for solid ADC) and tested for KRAS, EGFR, and ALK alterations.Results.-Both solid ADC and LCC groups were dominated by tumors with ''undifferentiated''-type morphology and both had a high frequency of thyroid transcription factor 1 expression. KRAS was mutated in 38% of solid ADCs versus 43% of LCCs (P ¼ .62). One ALK-rearranged and 1 EGFR-mutated tumor were detected in the solid ADC and LCC groups, respectively. There were no significant differences in clinical features or outcomes; the prevalence of smoking in both groups was greater than 95%.Conclusions.-Other than a paucity of intracellular mucin, LCC lacking squamous or neuroendocrine differentiation is indistinguishable from solid-subtype ADC. We propose the reclassification of these tumors as mucin-poor solid adenocarcinomas.
Describing the "ORFeome" of an organism, including all major isoforms, is essential for a systems understanding of any species; however, conventional cloning and sequencing approaches are prohibitively costly and labor-intensive. We describe a potentially genome-wide methodology for efficiently capturing novel coding isoforms using RT-PCR recombinational cloning, "deep well" pooling, and a "next generation" sequencing platform. This ORFeome discovery pipeline will be applicable to any eukaryotic species with a sequenced genome.Experimental definition of the complete set of protein-coding transcript sequences ("ORFeome") is fundamental for complete understanding of any organism, but this has not been achieved to date for any metazoan. Adding to the uncertainty, many eukaryotic genes exhibit alternative splicing, leading to a diversity of ORFs encoded by a single gene. Currently, 74% of human multi-exon genes and ~13% of Caenorhabditis elegans genes are predicted to undergo alternative splicing 1,2 . Expansion of the "isoform-space" in more complex organisms may partly explain the paradoxical lack of correlation between organismal complexity and gene number, and underscores the need to efficiently and comprehensively capture the full ORFeome. Historically, determination of intron-exon boundaries in eukaryotes has been addressed mainly by large-scale sequencing of random cDNAs (expressed sequence tags or ESTs) followed by alignment to a reference genomic DNA sequence. Although EST collections are extremely helpful, the human isoform-space remains under-explored. A targeted cloning and full length sequencing strategy could provide the desired information, but is impractically resource-intensive.Correspondence should be addressed to M.V. (marc_vidal@dfci.harvard.edu), K.S.A (kourosh_salehi-ashtiani@dfci.harvard.edu), or F.P.R. (fritz_roth@hms.harvard.edu). 5 These authors contributed equally to this work. Competing interest statementThe authors declare that they have no competing financial interests. Next-generation parallel sequencing technologies, such as the Roche 454 FLX, offer the prospect of sequencing at a much faster pace and lower cost than conventional Sanger-capillary platforms 3 . Most applications described so far have entailed resequencing of megabase-scale genomic DNA fragments 4-7 or of small sequence tags [8][9][10][11] . A disadvantage of the latter approach is that cis-connectivity is lost between the reads; therefore, although the reads can be assembled into contigs, mRNAs can not be assembled unambiguously when splice variants are involved. Sequencing of kilobase-scale DNA fragments from complex pools in which fragments have heterogeneous abundance has not yet been tested, nor has correct assembly of hundreds to thousands of full-length cDNAs in parallel from a complex mixture been proven feasible. NIH Public AccessPrevious and ongoing full-length cDNA isolation projects aim to discover one isoform per gene, without attempting to investigate the depth of "isoform space". Here we describe ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.