2019
DOI: 10.1201/9780429469275
|View full text |Cite
|
Sign up to set email alerts
|

Text Mining with Machine Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
27
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 37 publications
(27 citation statements)
references
References 0 publications
0
27
0
Order By: Relevance
“…The numerical channel is used to extract the pre-defined dense features (i.e., the protein Positionspecific scoring matrice (PSSM) and the intrinsic disorder tendency of each residue in both protein and peptide sequences). Each categorical channel contains a self-learning word embedding layer 30 , which takes one of the categorical features of the input peptide or protein (i.e., the raw amino acids, secondary structures, polarity, and hydropathy properties). Here, we design such a multi-channel architecture because the input profiles contain multifaceted features of different scales, which may bring inconsistency if we only use a simple encoder.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The numerical channel is used to extract the pre-defined dense features (i.e., the protein Positionspecific scoring matrice (PSSM) and the intrinsic disorder tendency of each residue in both protein and peptide sequences). Each categorical channel contains a self-learning word embedding layer 30 , which takes one of the categorical features of the input peptide or protein (i.e., the raw amino acids, secondary structures, polarity, and hydropathy properties). Here, we design such a multi-channel architecture because the input profiles contain multifaceted features of different scales, which may bring inconsistency if we only use a simple encoder.…”
Section: Resultsmentioning
confidence: 99%
“…1b). Each categorical channel consists of three self-learning word embedding layers 30 , taking amino acids, secondary structures, and physiochemical representations as input, respectively. Each numerical channel consists of a fully connected layer to take dense features as input, i.e., the intrinsic disorder tendencies features (ranging between 0 and 1) of peptides and proteins as well as the normalized evolutionary matrices (PSSM) of proteins.…”
Section: Methodsmentioning
confidence: 99%
“…The second part contains 20% percent which contains 1346 for the validation phase. The hyperparameters for the proposed model were set to as following (i) learning rate = 0.001, (ii) mini-batch size: 64, (iii) number of iterations = 30, (iv) early-stopping = 3 epochs, and (v) optimizer: AdaBoost [46].…”
Section: Resultsmentioning
confidence: 99%
“…Modeling. Text mining analysis included three main steps: preprocessing, text mining, and post-processing, generally after collecting textual data [38].…”
Section: Text Mining and Topicmentioning
confidence: 99%
“…Guidance has been taken from medical sciences specialists to achieve the desired number of topics in each category. It is important to note that the number of topics should be selected proportionally because a large number of topics will lead to a significant quantity of small and considerably similar topics [38,39]. Also, interpretation of topics becomes more challenging due to the dispersion of keywords between topics [40].…”
Section: Text Mining and Topicmentioning
confidence: 99%