Ting Yi Sung scite author profile

Background: RNA-protein interaction plays an essential role in several biological processes, such as protein synthesis, gene expression, posttranscriptional regulation and viral infectivity. Identification of RNA-binding sites in proteins provides valuable insights for biologists. However, experimental determination of RNA-protein interaction remains time-consuming and laborintensive. Thus, computational approaches for prediction of RNA-binding sites in proteins have become highly desirable. Extensive studies of RNA-binding site prediction have led to the development of several methods. However, they could yield low sensitivities in trade-off for high specificities.

show abstract

Multi-Q: A Fully Automated Tool for Multiplexed Protein Quantitation

Lin

et al. 2006

View full text Add to dashboard Cite

The iTRAQ labeling method combined with shotgun proteomic techniques represents a new dimension in multiplexed quantitation for relative protein expression measurement in different cell states. To expedite the analysis of vast amounts of spectral data, we present a fully automated software package, called Multi-Q, for multiplexed iTRAQ-based quantitation in protein profiling. Multi-Q is designed as a generic platform that can accommodate various input data formats from search engines and mass spectrometer manufacturers. To calculate peptide ratios, the software automatically processes iTRAQ's signature peaks, including peak detection, background subtraction, isotope correction, and normalization to remove systematic errors. Furthermore, Multi-Q allows users to define their own datafiltering thresholds based on semi-empirical values or statistical models so that the computed results of fold changes in peptide ratios are statistically significant. This feature facilitates the use of Multi-Q with 2 various instrument types with different dynamic ranges, which is an important aspect of iTRAQ analysis.The performance of Multi-Q is evaluated with a mixture of 10 standard proteins and human Jurkat T cells. The results are consistent with expected protein ratios and thus demonstrate the high accuracy, full automation, and high-throughput capability of Multi-Q as a large-scale quantitation proteomics tool.These features allow rapid interpretation of output from large proteomic datasets without the need for manual validation. Executable Multi-Q files are available on Windows platform at

show abstract

NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition

et al. 2006

View full text Add to dashboard Cite

Background: Biomedical named entity recognition (Bio-NER) is a challenging problem because, in general, biomedical named entities of the same category (e.g., proteins and genes) do not follow one standard nomenclature. They have many irregularities and sometimes appear in ambiguous contexts. In recent years, machine-learning (ML) approaches have become increasingly common and now represent the cutting edge of Bio-NER technology. This paper addresses three problems faced by ML-based Bio-NER systems. First, most ML approaches usually employ singleton features that comprise one linguistic property (e.g., the current word is capitalized) and at least one class tag (e.g., B-protein, the beginning of a protein name). However, such features may be insufficient in cases where multiple properties must be considered. Adding conjunction features that contain multiple properties can be beneficial, but it would be infeasible to include all conjunction features in an NER model since memory resources are limited and some features are ineffective. To resolve the problem, we use a sequential forward search algorithm to select an effective set of features. Second, variations in the numerical parts of biomedical terms (e.g., "2" in the biomedical term IL2) cause data sparseness and generate many redundant features. In this case, we apply numerical normalization, which solves the problem by replacing all numerals in a term with one representative numeral to help classify named entities. Third, the assignment of NE tags does not depend solely on the target word's closest neighbors, but may depend on words outside the context window (e.g., a context window of five consists of the current word plus two preceding and two subsequent words). We use global patterns generated by the Smith-Waterman local alignment algorithm to identify such structures and modify the results of our ML-based tagger. This is called pattern-based post-processing.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ting Yi Sung

Proteogenomics of Non-smoking Lung Cancer in East Asia Delineates Molecular Signatures of Pathogenesis and Progression

Predicting RNA-binding sites of proteins using support vector machines and evolutionary information

Multi-Q: A Fully Automated Tool for Multiplexed Protein Quantitation

NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition

Contact Info

Product

Resources

About