Highlights d First deep proteogenomic landscape of non-smoking lung adenocarcinoma in East Asia d Identified age, sex-related endogenous, and environmental carcinogen mutagenic processes d Proteome-informed classification distinguished clinical features within early stages d Protein networks identified tumorigenesis hallmarks, biomarkers, and druggable targets
Background: RNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimental determination of RNA-protein interaction remains time-consuming and laborintensive. Thus, computational approaches for prediction of RNA-binding sites in proteins have become highly desirable. Extensive studies of RNA-binding site prediction have led to the development of several methods. However, they could yield low sensitivities in trade-off for high specificities.
The iTRAQ labeling method combined with shotgun proteomic techniques represents a new dimension in multiplexed quantitation for relative protein expression measurement in different cell states. To expedite the analysis of vast amounts of spectral data, we present a fully automated software package, called Multi-Q, for multiplexed iTRAQ-based quantitation in protein profiling. Multi-Q is designed as a generic platform that can accommodate various input data formats from search engines and mass spectrometer manufacturers. To calculate peptide ratios, the software automatically processes iTRAQ's signature peaks, including peak detection, background subtraction, isotope correction, and normalization to remove systematic errors. Furthermore, Multi-Q allows users to define their own datafiltering thresholds based on semi-empirical values or statistical models so that the computed results of fold changes in peptide ratios are statistically significant. This feature facilitates the use of Multi-Q with 2 various instrument types with different dynamic ranges, which is an important aspect of iTRAQ analysis.The performance of Multi-Q is evaluated with a mixture of 10 standard proteins and human Jurkat T cells. The results are consistent with expected protein ratios and thus demonstrate the high accuracy, full automation, and high-throughput capability of Multi-Q as a large-scale quantitation proteomics tool.These features allow rapid interpretation of output from large proteomic datasets without the need for manual validation. Executable Multi-Q files are available on Windows platform at
Background: Biomedical named entity recognition (Bio-NER) is a challenging problem because, in general, biomedical named entities of the same category (e.g., proteins and genes) do not follow one standard nomenclature. They have many irregularities and sometimes appear in ambiguous contexts. In recent years, machine-learning (ML) approaches have become increasingly common and now represent the cutting edge of Bio-NER technology. This paper addresses three problems faced by ML-based Bio-NER systems. First, most ML approaches usually employ singleton features that comprise one linguistic property (e.g., the current word is capitalized) and at least one class tag (e.g., B-protein, the beginning of a protein name). However, such features may be insufficient in cases where multiple properties must be considered. Adding conjunction features that contain multiple properties can be beneficial, but it would be infeasible to include all conjunction features in an NER model since memory resources are limited and some features are ineffective. To resolve the problem, we use a sequential forward search algorithm to select an effective set of features. Second, variations in the numerical parts of biomedical terms (e.g., "2" in the biomedical term IL2) cause data sparseness and generate many redundant features. In this case, we apply numerical normalization, which solves the problem by replacing all numerals in a term with one representative numeral to help classify named entities. Third, the assignment of NE tags does not depend solely on the target word's closest neighbors, but may depend on words outside the context window (e.g., a context window of five consists of the current word plus two preceding and two subsequent words). We use global patterns generated by the Smith-Waterman local alignment algorithm to identify such structures and modify the results of our ML-based tagger. This is called pattern-based post-processing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.