K Nearest Neighbor OveRsampling approach: An open source python package for data augmentation

Islam, Ashhadul; Belhaouari, Samir Brahim; Rahman, Atiq Ur; Bensmail, Halima

doi:10.1016/j.simpa.2022.100272

Cited by 8 publications

(7 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The Naive Bayes method has the advantage of not requiring a large amount of training data to determine the estimated parameters needed in the classification process, which makes the classification process more effective and efficient [25] [26]. While the K-Nearest Neighbor method is easier to implement, experiments with this method show that it can provide good performance for independent data (which does not have word dependence) [27].…”

Section: Resultsmentioning

confidence: 99%

Sentiment Analysis of Kampus Mengajar 2 Toward the Implementation of Merdeka Belajar Kampus Merdeka Using Naïve Bayes and Euclidean Distence Methods

Rozaq

Yunitasari

Sussolaikah

et al. 2022

Int. J. Adv. Data Inf. Syst.

View full text Add to dashboard Cite

The Ministry of Education and Culture initiated the Merdeka Belajar Kampus Merdeka (MBKM) program. Several programs in Merdeka Belajar Kampus Merdeka (MBKM) Program include industrial internships, independent projects, student exchanges, community service projects, humanitarian programs, and so on. Kampus Mengajar 2 is one of the programs that hadbeen running. The program received various responses from the public, which were expressed on social media. The Supervisor at kampus mengajar 2 was also active in providing various comments on the kampus mengajar 2 telegramgroups in the form of good, bad, and neutral comments. These comments have the potential to generate a growing sentiment among the general public and academics. Based on these issues, the researcher analyzed the kampus mengajar 2 sentiments toward the implementation of the Merdeka Belajar Kampus Merdeka program with the data source being comments on the supervisors' telegram group. The data obtained from the telegram group is classified as good, bad, or neutral using the Naive Bayes method and K-Nearest Neighbors on up to 591 data points. The data is then divided into two parts: training data and testing data. Testing data can account for up to 20 percent of total data, with the remaining 80 percent serving as training data. The accuracy results on sentiment analysis show that the Naive Bayes method outperforms the KNN method, with 99.30 percent for Naive Bayes and 97.20 percent for K-Nearest Neighbors

show abstract

Section: Resultsmentioning

confidence: 99%

Sentiment Analysis of Kampus Mengajar 2 Toward the Implementation of Merdeka Belajar Kampus Merdeka Using Naïve Bayes and Euclidean Distence Methods

Rozaq

Yunitasari

Sussolaikah

et al. 2022

Int. J. Adv. Data Inf. Syst.

View full text Add to dashboard Cite

show abstract

“…The time taken by the KNNOR method is more compared to the SMOTE method, as has been showcased in the works of Islam et al [32]. This is because KNNOR comprises a pre-augmentation step to calculate the optimized distance and proportion of the population used while oversampling.…”

Section: A Limitationmentioning

confidence: 99%

“…As multiple neighbors are used to generating a single point, the time taken to create a new point is also impacted by the number of neighbors used. This additional time is justified by the greater accuracy achieved by the proposed method as shown in Table 1 for image data and in the works of [13], [32] for tabular data.…”

Section: A Limitationmentioning

confidence: 99%

Fast and Efficient Image Generation Using Variational Autoencoders and K-Nearest Neighbor OveRsampling Approach

Islam

Belhaouari

2023

IEEE Access

Self Cite

View full text Add to dashboard Cite

Researchers gravitate towards Generative Adversarial Networks (GAN) to create artificial images. However, GANs suffer from convergence issues, mode collapse, and overall complexity in balancing the Nash Equilibrium. Images generated are often distorted, rendering them useless. We propose a combination of Variational Autoencoders (VAEs) and a statistical oversampling method called K-Nearest Neighbor OveRsampling (KNNOR) to create artificial images. This combination of VAE and KNNOR results in more life-like images with reduced distortion. We fine-tune several pre-trained networks on a separate set of real and fake face images to test images generated by our method against images generated by conventional Deep Convolutional GANs (DCGANs). We also compare the combination of VAEs and Synthetic Minority Oversampling Technique (SMOTE) to establish the efficacy of KNNOR against naive oversampling methods. Not only are our methods better able to convince the classifiers that the images generated are authentic, but the models are also half in size of DCGANs. The code is available at GitHub for public use.

show abstract

“…Data hadir dalam jumlah besar, tetapi masalah kumpulan data yang tidak seimbang muncul berulang kali, mengganggu pengklasifikasi dan mengurangi akurasi [12]. Augmentasi data adalah proses memodifikasi atau memanipulasi suatu citra sehingga citra asli dalam bentuk yang telah disiapkan berubah bentuk dan posisinya.…”

Section: Data Augmentasiunclassified

Optimasi Klasifikasi Batik Betawi Menggunakan Data Augmentasi Dengan Metode KNN Dan GLCM

Akbar¹,

Mulyana²

2022

jatim

View full text Add to dashboard Cite

Batik telah menjadi salah satu warisan budaya leluhur negara Indonesia yang terus dikembangkan, dilestarikan dan dijadikan identitas budaya bangsa Indonesia. Salah satu batik yang belum terangkat ke permukaan adalah batik Betawi. Penelitian ini dilakukan untuk mengklasifikasikan batik betawi ke dalam beberapa kelas berdasarkan motif nya sehingga mempermudah dalam pengenalan batik betawi secara citra digital. Metode yang digunakan adalah K-Nearest Neighbor untuk menentukan kedekatan antara citra uji dengan citra latih sedangkan Gray-Level Co-occurrence Matrix untuk ekstraksi ciri teksturnya. Untuk dataset penulis menggunakan dataset publik dari website Kaggle yang berjudul “Indonesian Batik Motifs” dan beberapa sumber dari Google. Karena kekurangan banyak dataset, maka penulis mengaugmentasi dataset yang sudah di dapatkan hingga berjumlah 1.020 citra. Dan hasilnya persentase nilai akurasi tertinggi terdapat pada motif Burung Hong, Monas, Nusa Kelapa, Pengantin Betawi, Ondel-Ondel, Rasamala dan Salakanagara sebanyak 97%. Untuk nilai akurasi terendah terdapat pada motif Kali Ciliwung dan Topeng Betawi sebanyak 93%. Selebihnya yaitu motif Golok, Penari Ngarojeng dan Pucuk Rebung mendapatkan nilai akurasi sebanyak 95%. Dan nilai rata-rata akurasi dari semua motif batik Betawi ini mendapatkan nilai 96%. Hasil ini menunjukan bahwa penelitian ini sangat baik. Kata kunci : Klasifikasi, Batik Betawi, K-Nearest Neighbor, Gray-Level Co-occurrence Matrix.

show abstract

K Nearest Neighbor OveRsampling approach: An open source python package for data augmentation

Cited by 8 publications

References 14 publications

Sentiment Analysis of Kampus Mengajar 2 Toward the Implementation of Merdeka Belajar Kampus Merdeka Using Naïve Bayes and Euclidean Distence Methods

Sentiment Analysis of Kampus Mengajar 2 Toward the Implementation of Merdeka Belajar Kampus Merdeka Using Naïve Bayes and Euclidean Distence Methods

Fast and Efficient Image Generation Using Variational Autoencoders and K-Nearest Neighbor OveRsampling Approach

Optimasi Klasifikasi Batik Betawi Menggunakan Data Augmentasi Dengan Metode KNN Dan GLCM

Contact Info

Product

Resources

About