2017
DOI: 10.3390/ijms18112400
|View full text |Cite
|
Sign up to set email alerts
|

UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences

Abstract: With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
10

Relationship

0
10

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 62 publications
0
7
0
Order By: Relevance
“…[ 93 ], and UltraPse 11 https://github.com/pufengdu/UltraPse . software [ 94 ] which allows all possible sequence representation modes for user-defined sequence types. Here, UltraPse source code has been downloaded and run locally on our ubuntu platform to extract pseudo amino acid composition (Type I General PseAAC) for each protein sequence in the benchmark datasets.…”
Section: Methodsmentioning
confidence: 99%
“…[ 93 ], and UltraPse 11 https://github.com/pufengdu/UltraPse . software [ 94 ] which allows all possible sequence representation modes for user-defined sequence types. Here, UltraPse source code has been downloaded and run locally on our ubuntu platform to extract pseudo amino acid composition (Type I General PseAAC) for each protein sequence in the benchmark datasets.…”
Section: Methodsmentioning
confidence: 99%
“…In this section, we describe two lncRNA features and two protein features, based on lncRNA sequences, protein sequences and known lncRNA-protein interactions. On one hand, a great number of features [3036] can be extracted from lncRNAs sequences and proteins sequences, and feature-extraction tools such as Pse-in-One[37], BioSeq-Analysis[38], repRNA[39] [40], iMiRNA-PseDPC [41] and UltraPse [42] have been available. One the other hand, known lncRNA-protein interactions can bring features to describe lncRNAs and proteins.…”
Section: Methodsmentioning
confidence: 99%
“…However, complicated strategies are usually needed to integrate knowledge into the models. Now that a huge amount of data from multiple omics, such as transcriptomics, metabonomics, have been accumulated and there are many feature extracting methods (Iqbal et al, 2014;Liu et al, 2015;Du et al, 2017;Liu et al, 2017;Gao and Wu 2018;Wang et al, 2020), some researchers regarded the identification of enzyme candidates as the catalytic and non-catalytic classification problem and built models to classify protein sequences or encoding genes into either catalytic or non-catalytic by using machine learning algorithms such as support vector machine (SVM), K-nearest neighbors (KNN), Bayesian, and RF (Teng et al, 2010;Halperin et al, 2008;Ferrari and Mitchell 2014;Nagao et al, 2014;Amidi et al, 2017). The workflow for classifying protein sequences as catalytic and non-catalytic protein sequences is illustrated in Figure 1.…”
Section: Identification Of Candidates Of Missing Enzymesmentioning
confidence: 99%