Sangtae Kim scite author profile

Summary We analyzed proteomes of colon and rectal tumors previously characterized by the Cancer Genome Atlas (TCGA) and performed integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants. mRNA transcript abundance did not reliably predict protein abundance differences between tumors. Proteomics identified five proteomic subtypes in the TCGA cohort, two of which overlapped with the TCGA “MSI/CIMP” transcriptomic subtype, but had distinct mutation, methylation, and protein expression patterns associated with different clinical outcomes. Although copy number alterations showed strong cis- and trans-effects on mRNA abundance, relatively few of these extend to the protein level. Thus, proteomics data enabled prioritization of candidate driver genes. The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels; proteomics data highlighted potential 20q candidates including HNF4A, TOMM34 and SRC. Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.

show abstract

Voltage, stability and diffusion barrier differences between sodium-ion and lithium-ion intercalation materials

Ong

Chevrier

Hautier

et al. 2011

Energy Environ. Sci.

1,267

1,035

View full text Add to dashboard Cite

To evaluate the potential of Na-ion batteries, we contrast in this work the difference between Na-ion and Li-ion based intercalation chemistries in terms of three key battery properties -voltage, phase stability and diffusion barriers. The compounds investigated comprise the layered AMO 2 and AMS 2 structures, the olivine and maricite AMPO 4 structures, and the NA-SICON A 3 V 2 (PO 4 ) 3 structures. The calculated Na voltages for the compounds investigated are 0.18-0.57 V lower than that of the corresponding Li voltages, in agreement with previous experimental data. We believe the observed lower voltages for Na compounds are predominantly a cathodic effect related to the much smaller energy gain from inserting Na into the host structure compared to inserting Li. We also found a relatively strong dependence of battery properties with structural features. In general, the difference between the Na and Li voltage of the same compound, ∆V Na-Li , is less negative for the maricite structures preferred by Na, and * To whom correspondence should be addressed 1 more negative for the olivine structures preferred by Li. The layered compounds have the most negative ∆V Na-Li . In terms of phase stability, we found that open structures, such as the layered and NASICON structures that are better able to accommodate the larger Na + ion generally have both Na and Li versions of the same compound. For the close-packed AMPO 4 structures, our results show that Na generally prefers the maricite structure, while Li prefers the olivine structure, in agreement with previous experimental work. We also found surprising evidence that the barriers for Na + migration can potentially be lower than that for Li + migration in the layered structures. Overall, our findings indicate that Na-ion systems can be competitive with Li-ion systems.

show abstract

MS-GF+ makes progress towards a universal database search tool for proteomics

2014

View full text Add to dashboard Cite

Mass spectrometry (MS) instruments and experimental protocols are rapidly advancing, but the software tools to analyze tandem mass spectra are lagging behind. We present a database search tool MS-GF+ that is sensitive (it identifies more peptides than most other database search tools) and universal (it works well for diverse types of spectra, different configurations of MS instruments and different experimental protocols). We benchmark MS-GF+ using diverse spectral datasets: (i) spectra of varying fragmentation methods; (ii) spectra of multiple enzyme digests; (iii) spectra of phosphorylated peptides; (iv) spectra of peptides with unusual fragmentation propensities produced by a novel alpha-lytic protease. For all these datasets, MS-GF+ significantly increases the number of identified peptides compared to commonly used methods for peptide identifications. We emphasize that while MS-GF+ is not specifically designed for any particular experimental set-up, it improves upon the performance of tools specifically designed for these applications (e.g., specialized tools for phosphoproteomics).

show abstract

Strelka2: fast and accurate calling of germline and somatic variants

et al. 2018

View full text Add to dashboard Cite

We describe Strelka2 ( https://github.com/Illumina/strelka ), an open-source small-variant-calling method for research and clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model-based estimation of insertion/deletion error parameters from each sample, an efficient tiered haplotype-modeling strategy, and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperformed the current leading tools in terms of both variant-calling accuracy and computing cost.

show abstract

Spectral Probabilities and Generating Functions of Tandem Mass Spectra: A Strike against Decoy Databases

2008

View full text Add to dashboard Cite

A key problem in computational proteomics is distinguishing between correct and false peptide identifications. We argue that evaluating the error rates of peptide identifications is not unlike computing generating functions in combinatorics. We show that the generating functions and their derivatives (spectral energy and spectral probability) represent new features of tandem mass spectra that, similarly to Δ-scores, significantly improve peptide identifications. Furthermore, the spectral probability provides a rigorous solution to the problem of computing statistical significance of spectral identifications. The spectral energy/probability approach improves the sensitivity-specificity trade-off of existing MS/MS search tools, addresses the notoriously difficult problem of "one-hitwonders" in mass spectrometry, and often eliminates the need for decoy database searches. We therefore argue that the generating function approach has the potential to increase the number of peptide identifications in MS/MS searches.

show abstract

The Amborella Genome and the Evolution of Flowering Plants

Palmer¹,

Ammiraju²,

Ralph³

et al. 2013

Science

708

368

View full text Add to dashboard Cite

Amborella trichopoda is strongly supported as the single living species of the sister lineage to all other extant flowering plants, providing a unique reference for inferring the genome content and structure of the most recent common ancestor (MRCA) of living angiosperms. Sequencing the Amborella genome, we identified an ancient genome duplication predating angiosperm diversification, without evidence of subsequent, lineage-specific genome duplications. Comparisons between Amborella and other angiosperms facilitated reconstruction of the ancestral angiosperm gene content and gene order in the MRCA of core eudicots. We identify new gene families, gene duplications, and floral protein-protein interactions that first appeared in the ancestral angiosperm. Transposable elements in Amborella are ancient and highly divergent, with no recent transposon radiations. Population genomic analysis across Amborella's native range in New Caledonia reveals a recent genetic bottleneck and geographic structure with conservation implications.

show abstract

On the Conductivity Mechanism of Nanocrystalline Ceria

Kim

Maier

2002

J. Electrochem. Soc.

331

287

View full text Add to dashboard Cite

Electrical conductivities of Gd-doped (0.15 mol %) and nominally pure nanocrystalline CeO2−x ceramics (∼30 nm grain size) were measured by impedance spectroscopy in the temperature range of 673-773 K under various oxygen partial pressures (1- 105 Pa). The ionic and electronic contributions were separated using electrochemical polarization with an electronically blocking electrode, yttria-stabilized zirconia. The results allow for a clear distinction between potential explanations. It is shown that the space charge model (space charge zones with potential of ∼0.3 V resulting in depletion of oxygen vacancies and accumulation of conduction electrons) explains all the experimental features. © 2002 The Electrochemical Society. All rights reserved.

show abstract

The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search

Kim

Mischerikow

Bandeira

et al. 2010

Molecular & Cellular Proteomics

216

281

View full text Add to dashboard Cite

Recent emergence of new mass spectrometry techniques (e.g. electron transfer dissociation, ETD) and improved availability of additional proteases (e.g. Lys-N) for protein digestion in high-throughput experiments raised the challenge of designing new algorithms for interpreting the resulting new types of tandem mass (MS/MS) spectra. Traditional MS/MS database search algorithms such as SEQUEST and Mascot were originally designed for collision induced dissociation (CID) of tryptic peptides and are largely based on expert knowledge about fragmentation of tryptic peptides (rather than machine learning techniques) to design CID-specific scoring functions. As a result, the performance of these algorithms is suboptimal for new mass spectrometry technologies or nontryptic peptides. We recently proposed the generating function approach (MS-GF) for CID spectra of tryptic peptides. In this study, we extend MS-GF to automatically derive scoring parameters from a set of annotated MS/MS spectra of any type (e.g. CID, ETD, etc.), and present a new database search tool MS-GFDB based on MS-GF. We show that MS-GFDB outperforms Mascot for ETD spectra or peptides digested with Lys-N. For example, in the case of ETD spectra, the number of tryptic and Lys-N peptides identified by MS-GFDB increased by a factor of 2.7 and 2.6 as compared with Mascot. Moreover, even following a decade of Mascot developments for analyzing CID spectra of tryptic peptides, MS-GFDB (that is not particularly tailored for CID spectra or tryptic peptides) resulted in 28% increase over Mascot in the number of peptide identifications. Finally, we propose a statistical framework for analyzing multiple spectra from the same precursor (e.g. CID/ETD spectral pairs) and assigning p values to peptide-spectrum-spectrum matches. Molecular & Cellular Proteomics 9:2840 -2852, 2010.Since the introduction of electron capture dissociation (ECD) 1 in 1998 (1), electron-based peptide dissociation technologies have played an important role in analyzing intact proteins and post-translational modifications (2). However, until recently, this research-grade technology was available only to a small number of laboratories because it was commercially unavailable, required experience for operation, and could be implemented only with expensive FT-ICR instruments. The discovery of electron-transfer dissociation (ETD) (3) enabled an ECD-like technology to be implemented in (relatively cheap) ion-trap instruments. Nowadays, many researchers are employing the ETD technology for tandem mass spectra generation (4 -9).Although the hardware technologies to generate ETD spectra are maturing rapidly, software technologies to analyze ETD spectra are still in infancy. There are two major approaches to analyzing tandem mass spectra: de novo sequencing and database search. Both approaches find the best-scoring peptide either among all possible peptides (de novo sequencing) or among all peptides in a protein database (database search). Although de novo sequencing is emerging as an alternative to databa...

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sangtae Kim

Proteogenomic characterization of human colon and rectal cancer

Voltage, stability and diffusion barrier differences between sodium-ion and lithium-ion intercalation materials

MS-GF+ makes progress towards a universal database search tool for proteomics

Strelka2: fast and accurate calling of germline and somatic variants

Spectral Probabilities and Generating Functions of Tandem Mass Spectra: A Strike against Decoy Databases

The Amborella Genome and the Evolution of Flowering Plants

On the Conductivity Mechanism of Nanocrystalline Ceria

The Generating Function of CID, ETD, and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search

Contact Info

Product

Resources

About