2015
DOI: 10.1109/tcbb.2014.2382127
|View full text |Cite
|
Sign up to set email alerts
|

Software Suite for Gene and Protein Annotation Prediction and Similarity Search

Abstract: In the computational biology community, machine learning algorithms are key instruments for many applications, including the prediction of gene-functions based upon the available biomolecular annotations. Additionally, they may also be employed to compute similarity between genes or proteins. Here, we describe and discuss a software suite we developed to implement and make publicly available some of such prediction methods and a computational technique based upon Latent Semantic Indexing (LSI), which leverages… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
4
1

Relationship

3
7

Authors

Journals

citations
Cited by 29 publications
(21 citation statements)
references
References 24 publications
0
20
0
Order By: Relevance
“…For numerical datasets, in addition, the normalization (or scaling ) by feature (by column) into the [0;1] interval is often necessary to put the whole dataset into a common frame, before the machine learning algorithm process it. Latent semantic indexing (LSI), for example, is an information retrieval method which necessitates this pre-processing to be employed for prediction of gene functional annotations [ 13 ]. Data normalization into the [ min ; max ] interval, or into an interval having a particular mean (for example, 0.0) and a particular standard deviation (for example, 1.0) are also popular strategies [ 14 ].…”
Section: Tip 1: Check and Arrange Your Input Dataset Properlymentioning
confidence: 99%
“…For numerical datasets, in addition, the normalization (or scaling ) by feature (by column) into the [0;1] interval is often necessary to put the whole dataset into a common frame, before the machine learning algorithm process it. Latent semantic indexing (LSI), for example, is an information retrieval method which necessitates this pre-processing to be employed for prediction of gene functional annotations [ 13 ]. Data normalization into the [ min ; max ] interval, or into an interval having a particular mean (for example, 0.0) and a particular standard deviation (for example, 1.0) are also popular strategies [ 14 ].…”
Section: Tip 1: Check and Arrange Your Input Dataset Properlymentioning
confidence: 99%
“…In the future, we plan to employing alternative approaches for data missing imputation, such as oversampling through k-nearest neighbors [40] or latent semantic indexing similarity [8]. We also plan to try alternative prediction models, like probabilistic latent semantic analysis [36].…”
Section: Resultsmentioning
confidence: 99%
“…Feature work will also include the enhancement of the presented machinery by applying alternative techniques to handle the data class-imbalance [37, 67, 68], the application of our algorithm combination to other disease health record datasets (for example, [41]), the application of alternative machine learning algorithms (for example, latent Dirichlet allocation [76] or probabilistic latent semantic analysis [77]) for the diagnosis prediction, and the possible usage of semantic similarity measures to incorporate similarity information between features (for example, through latent semantic indexing [78]). We also plan to explore the feature dependence in the dataset, to see what feature influence which other features and how.…”
Section: Discussionmentioning
confidence: 99%