Since the future of the society depends upon the role of students, so suitable career selection methods for the students are considered to be an important problem to explore. It is assumed that if a student has the required capability and positive attitudes towards a subject, then the student will achieve more in that subject. To consider the uncertain issues involved with students’ career selection, picture fuzzy set (PFS) and rough set based approaches are proposed in this study as they are found to be appropriate due to their inherent characteristics to deal with incomplete and imprecise information. For the purpose of selecting a suitable career, the article analyzes student's features in terms of career, memory, interest, knowledge, environment and attitude. We propose two hybridized distance measures using Hausdorff, Hamming and Euclidian distances under picture fuzzy environment where the evaluating information regarding students, subjects and student's features are given in picture fuzzy numbers. Then we present an algorithmic approach using the proposed distance measures and rough set theory. We apply rough set theory to determine whether a particular subject is suitable for a student even if there is controversy to select a stream. Lower and higher approximation with boundary region of rough set theory is used to manage the inconsistent situations. Finally, two case studies are demonstrated to validate the applicability of the proposed idea.
Visual Genome is a dataset connecting structured image information with English language. We present "Hindi Visual Genome", a multimodal dataset consisting of text and images suitable for English-Hindi multimodal machine translation task and multimodal research. We have selected short English segments (captions) from Visual Genome along with associated images and automatically translated them to Hindi with manual post-editing which took the associated images into account. We prepared a set of 31525 segments, accompanied by a challenge test set of 1400 segments. This challenge test set was created by searching for (particularly) ambiguous English words based on the embedding similarity and manually selecting those where the image helps to resolve the ambiguity. Our dataset is the first for multimodal English-Hindi machine translation, freely available for noncommercial research purposes. Our Hindi version of Visual Genome also allows to create Hindi image labelers or other practical tools. Hindi Visual Genome also serves in Workshop on Asian Translation (WAT) 2019 Multi-Modal Translation Task.
In this paper, a parallel genetic based association rule mining method is proposed to discover interesting rules from a large biological database. Apriori algorithms and its variants for association rule mining rely on two user specified threshold parameters such as minimum support and minimum confidence which is obviously an issue to be resolved. In addition, there are other issues like large search space and local optimality attracts many researchers to use heuristic mechanism. In the presence of large biological databases and with an aim to circumvent these problems, genetic algorithm may be taken as a suitable tool, but its computational cost is the main bottle-neck. Therefore, we choose parallel genetic algorithms to get relief from the pain of computational cost. The experimental result is promising and encouraging to do further research especially in the domain of biological science. I. INTROUDUCTIONA parallel genetic based association rule mining method is proposed to discover interesting rules from a large biological database or biomedical dataset. Association rule mining depends upon two user specified thresh hold value known as support and confidence. Apriori algorithms for association rule mining also rely on two user specified threshold parameters such as minimum support and minimum confidence. However, there are certain challenges in applying apriori like algorithm, e.g., database dependent minimum support and large search space. Hence, in the presence of large biological databases, it is a difficult task to guess the threshold value for minimum support.To avoid these problems, genetic algorithm may be considered as a suitable tool, but its computational cost is the main bottleneck. Therefore, we choose parallel genetic algorithms to get relief from the pain of computational cost. In our work, it is not required to give the user specified minimum support or minimum confidence value, it gets automatically generated through the genetic algorithm. As sometimes, if we provide it by the user, we may find some interesting patterns miss out which is having less value as minimum support or minimum confidence. II. PRELIMINARIESDatabase-dependent minimum-support means that users must specify suitable thresholds for their mining tasks though they may have no knowledge concerning their databases. To avoid these problems, in this paper, we intend to use an evolutionary mining strategy in which association rule mining based on a genetic algorithm has been implemented. It has been observed that the fitness evaluation in genetic algorithm is mostly the expensive step; hence to minimize the overall computational complexity of genetic algorithm it is indeed to compute the fitness in parallel. A model is illustrated in Fig. 2. A. Association Rule MiningAssociation rule mining is one of the most important rules of data mining, used to extract interesting correlations, frequent patterns, and associations among a set of items in the transaction database. Due to its high degree of implementation in areas such as telecom n...
As per recent trends heart disease has become the major factor for untimely deaths. There are huge amounts of clinical data available from biomedical devices and various applications used by hospitals. Artificial Intelligence is rigorously being used in predicting conditions of heart patients. This is mainly achieved by machine learning where a model is trained with sample cases and is then used for prediction of the ailment as per data available from clinical tests of the patient. This paper focuses in analyzing the accuracy of various classification algorithms, when they are supervised by set of features. Feature selection plays an important role in eliminating redundant and irrelevant features and reduces the training cost and time of the predictive models. The classification algorithms, which have been analyzed include Naive Bayes, Random Forest, Extra Trees and Logistic regression which have been provided with selected features using least absolute shrinkage and selection operator (LASSO) and Ridge regression. The accuracy of the classifiers shows remarkable improvement after using feature selection. The prediction has improved on an average by 33.3% using Lasso regression as compared to 30.73% using ridge regression.
Autism Spectrum Disorder (ASD) was characterized by delay in social interactions development, repetitive behaviors and narrow interest, which usually diagnosed with standard diagnostic tools such as Autism Diagnostic Observation Schedule (ADOS) and Autism Diagnostic Interview-Revised (ADIR-R). Previous work has implemented machine-learning methods for the classification of ASD, however they used different types of dataset such as brain images for MRI and EEG, risk genes in genetic profiles and behavior evaluation based on ADOS and ADI-R. Here a trial on using Autism Spectrum Questions (AQ) to build models that have higher potential to classify ASD was developed. In this research, Chi-square and Least Absolute Shrinkage and Selection Operator (LASSO) have been selected as feature selection methods to select the most important features for 3 supervised machine learning algorithms, which are Random Forest, Logistic Regression and K-Nearest Neighbors with K-fold cross validation. The performance was evaluated in which results Logistic Regression scored the highest accuracy with 97.541% using model with 13 selected features based on Chi-square selection method.
Breast cancer has been identified as the second leading cause of death among women worldwide after lung cancer and hence, it becomes extremely crucial to identify it at an early stage, which can considerably increase the chances of survival. The most important part in cancer detection is to be able to differentiate between benign and malignant tumors and this is where the work of Machine Learning comes in. Taking all the dependent features upon consideration, Supervised Machine Learning methods allow for classification with higher degree of accuracy and improve upon the misdiagnosis of the physicians, which might occur almost 20% of the time. In our paper, we are focusing towards understanding the shortcomings of digital mammograms in detection of breast cancer and utilize Machine Learning classifiers for the classification of benign and malignant tumors using image analysis. Apart from this, we are also looking into implementing Supervised Machine Learning classifiers such as Decision Tree, K Nearest Neighbour (KNN), Random Forest and Gaussian Naive Bayes classifiers for assessing the risks involved with breast cancer by analyzing the biomarkers that are involved with it. Our aim is to provide a comprehensive view on prediction of breast cancer through Machine Learning through both image and data analyses, which can play a pivotal role in prevention of misdiagnosis in future. Fig. 1. gives a layout for the breast cancer prediction using Supervised Machine learning classifiers.
The preparation of parallel corpora is a challenging task, particularly for languages that suffer from under-representation in the digital world. In a multi-lingual country like India, the need for such parallel corpora is stringent for several low-resource languages. In this work, we provide an extended English-Odia parallel corpus, OdiEnCorp 2.0, aiming particularly at Neural Machine Translation (NMT) systems which will help translate English↔Odia. OdiEnCorp 2.0 includes existing English-Odia corpora and we extended the collection by several other methods of data acquisition: parallel data scraping from many websites, including Odia Wikipedia, but also optical character recognition (OCR) to extract parallel data from scanned images. Our OCR-based data extraction approach for building a parallel corpus is suitable for other low resource languages that lack in online content. The resulting OdiEnCorp 2.0 contains 98,302 sentences and 1.69 million English and 1.47 million Odia tokens. To the best of our knowledge, OdiEnCorp 2.0 is the largest Odia-English parallel corpus covering different domains and available freely for non-commercial and research purposes.
The awareness related to fertility is of great importance due to the change in lifestyle habits. Semen analysis is a reliable confirmatory test to check the fertility in men. The supervised machine learning models of base classifiers include Decision Tree, Logistic Regression and Naive Bayes classifiers in which logistic regression shows a promising accuracy of 88%. Comparing with the bagging ensemble method for the weakest classifier, the results show a leap in accuracy from 78.80% to 90.02%. The authors have also attempted to design a novel voting classifier which votes over the ensemble learners and creates a more complex model to give an accuracy of 89%. Apart from this, the authors have also analyzed the receiver operating characteristic (ROC) curve for Extra Tree classifier which shows a 66% of area under the curve (AUC). The validation procedure used is a 5 fold cross-validation. The authors have further analyzed the lifestyle habits responsible for contributing to this problem based on impurity-based feature selection and have obtained ‘Age' as the most crucial factor in declining seminal quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.