2020
DOI: 10.1007/s42979-020-00156-5
|View full text |Cite
|
Sign up to set email alerts
|

Using SMOTE to Deal with Class-Imbalance Problem in Bioactivity Data to Predict mTOR Inhibitors

Abstract: Machine learning algorithms give sub-optimal performance in the presence of class-imbalanced dataset. Mammalian target of rapamycin (mTOR) is one of the serine/threonine protein kinase, and plays an integral role in autophagy pathway. Autophagy is a cellular pathway for recycling of macromolecules (proteins, lipids, and organelles), which enables eukaryotic cells to adapt metabolism to survive during adverse growth conditions. Targeting mTOR through therapeutic interventions of autophagy pathway establishes mT… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 20 publications
(19 reference statements)
0
9
0
Order By: Relevance
“…For solving these problems, Chawla proposed the Synthetic Minority Over-sampling technique (SMOTE), which creates synthetic samples from the minority class. The SMOTE samples are linear combinations of two similar samples from the minority class [29].…”
Section: Data-level Methodsmentioning
confidence: 99%
“…For solving these problems, Chawla proposed the Synthetic Minority Over-sampling technique (SMOTE), which creates synthetic samples from the minority class. The SMOTE samples are linear combinations of two similar samples from the minority class [29].…”
Section: Data-level Methodsmentioning
confidence: 99%
“…To overcome this problem, Chawla [13] proposed the SMOTE technique, which generates synthetic samples from the minority class. Samples created by the SMOTE technique are a linear combination of two identical samples from the minority class [13, 26, 28]. The SMOTE over‐sampling algorithm works as follows: Let S represent the size of a small class, considering a sample j of the small size class, and x j denotes its feature vector such that, j ∈ {1, …, S }: Find k neighbours of the sample x j from all S (using the Euclidean Distance for example) and denoted it as x j ( near ) , near ∈ {1, …, k }. x j ( nn ) sample is selected randomly from the k neighbours, and the random number β 1 between 0 and 1 is generated to synthesise a new sample x j 1 as in the equation x j 1 = x j + β 1 ∗ ( x j ( nn ) − x j ). Step 2 is repeated M times to synthesise M new samples: x j ( new ) , new ∈ {1, …, M }. …”
Section: Proposed Methodsmentioning
confidence: 99%
“…Over-sampling techniques add some more instances to the minority class in the training set, the simplest method is random over-sampling [26] but the drawback is over-fitting [27]. To overcome this problem, Chawla [13] proposed the SMOTE technique, which generates synthetic samples from the minority class. Samples created by the SMOTE technique are a linear combination of two identical samples from the minority class [13,26,28].…”
Section: Samplingmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, Tox21 is severely imbalanced; the volume of the inactive (negative/nontoxic) data set is much larger than that of the active (positive/toxic) data set. As a result, multitask deep learning models are unable to thoroughly explore the essence of the minority class data set consisting of the positive compounds. , To address this issue, several data augmentation approaches have been introduced in previous studies, such as the resampling effect and the synthetic minority oversampling technique (SMOTE). , Manual preprocessing is an essential ingredient of these data augmentation strategies, which may impact the objectivity and performance of the models to some extent. Data augmentation technologies are also not suitable for direct use in chemicals.…”
Section: Introductionmentioning
confidence: 99%