Shunfang Wang scite author profile

An effective representation of a protein sequence plays a crucial role in protein sub-nuclear localization. The existing representations, such as dipeptide composition (DipC), pseudo-amino acid composition (PseAAC) and position specific scoring matrix (PSSM), are insufficient to represent protein sequence due to their single perspectives. Thus, this paper proposes two fusion feature representations of DipPSSM and PseAAPSSM to integrate PSSM with DipC and PseAAC, respectively. When constructing each fusion representation, we introduce the balance factors to value the importance of its components. The optimal values of the balance factors are sought by genetic algorithm. Due to the high dimensionality of the proposed representations, linear discriminant analysis (LDA) is used to find its important low dimensional structure, which is essential for classification and location prediction. The numerical experiments on two public datasets with KNN classifier and cross-validation tests showed that in terms of the common indexes of sensitivity, specificity, accuracy and MCC, the proposed fusing representations outperform the traditional representations in protein sub-nuclear localization, and the representation treated by LDA outperforms the untreated one.

show abstract

ACEP: improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding

Cao

et al. 2020

BMC Genomics

View full text Add to dashboard Cite

Background: Antimicrobial resistance is one of our most serious health threats. Antimicrobial peptides (AMPs), effecter molecules of innate immune system, can defend host organisms against microbes and most have shown a lowered likelihood for bacteria to form resistance compared to many conventional drugs. Thus, AMPs are gaining popularity as better substitute to antibiotics. To aid researchers in novel AMPs discovery, we design computational approaches to screen promising candidates. Results: In this work, we design a deep learning model that can learn amino acid embedding patterns, automatically extract sequence features, and fuse heterogeneous information. Results show that the proposed model outperforms state-of-the-art methods on recognition of AMPs. By visualizing data in some layers of the model, we overcome the black-box nature of deep learning, explain the working mechanism of the model, and find some import motifs in sequences. Conclusions: ACEP model can capture similarity between amino acids, calculate attention scores for different parts of a peptide sequence in order to spot important parts that significantly contribute to final predictions, and automatically fuse a variety of heterogeneous information or features. For high-throughput AMPs recognition, open source software and datasets are made freely available at https://github.com/Fuhaoyi/ACEP.

show abstract

Sample size determination for matched-pair equivalence trials using rate ratio

Tang

Wang

2006

Biostatistics

View full text Add to dashboard Cite

In this article, we compare Wald-type, logarithmic transformation, and Fieller-type statistics for the classical 2-sided equivalence testing of the rate ratio under matched-pair designs with a binary end point. These statistics can be implemented through sample-based, constrained least squares estimation and constrained maximum likelihood (CML) estimation methods. Sample size formulae based on the CML estimation method are developed. We consider formulae that control a prespecified power or confidence width. Our simulation studies show that statistics based on the CML estimation method generally outperform other statistics and methods with respect to actual type I error rate and average width of confidence intervals. Also, the corresponding sample size formulae are valid asymptotically in the sense that the exact power and actual coverage probability for the estimated sample size are generally close to their prespecified values. The methods are illustrated with a real example from a clinical laboratory study.

show abstract

iEnhancer-RD: Identification of enhancers and their strength using RKPK features and deep neural networks

Yang

Wang

Xia

2021

Analytical Biochemistry

View full text Add to dashboard Cite

Accurate classification of membrane protein types based on sequence and evolutionary information using deep learning

Wang

Cao

2019

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundMembrane proteins play an important role in the life activities of organisms. Knowing membrane protein types provides clues for understanding the structure and function of proteins. Though various computational methods for predicting membrane protein types have been developed, the results still do not meet the expectations of researchers.ResultsWe propose two deep learning models to process sequence information and evolutionary information, respectively. Both models obtained better results than traditional machine learning models. Furthermore, to improve the performance of the sequence information model, we also provide a new vector representation method to replace the one-hot encoding, whose overall success rate improved by 3.81% and 6.55% on two datasets. Finally, a more effective model is obtained by fusing the above two models, whose overall success rate reached 95.68% and 92.98% on two datasets.ConclusionThe final experimental results show that our method is more effective than existing methods for predicting membrane protein types, which can help laboratory researchers to identify the type of novel membrane proteins.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shunfang Wang

Protein Sub-Nuclear Localization Based on Effective Fusion Representations and Dimension Reduction Algorithm LDA

ACEP: improving antimicrobial peptides recognition through automatic feature fusion and amino acid embedding

Sample size determination for matched-pair equivalence trials using rate ratio

iEnhancer-RD: Identification of enhancers and their strength using RKPK features and deep neural networks

Accurate classification of membrane protein types based on sequence and evolutionary information using deep learning

Contact Info

Product

Resources

About