In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%.
<abstract><p>Rainfall prediction includes forecasting the occurrence of rainfall and projecting the amount of rainfall over the modeled area. Rainfall is the result of various natural phenomena such as temperature, humidity, atmospheric pressure, and wind direction, and is therefore composed of various factors that lead to uncertainties in the prediction of the same. In this work, different machine learning and deep learning models are used to (a) predict the occurrence of rainfall, (b) project the amount of rainfall, and (c) compare the results of the different models for classification and regression purposes. The dataset used in this work for rainfall prediction contains data from 49 Australian cities over a 10-year period and contains 23 features, including location, temperature, evaporation, sunshine, wind direction, and many more. The dataset contained numerous uncertainties and anomalies that caused the prediction model to produce erroneous projections. We, therefore, used several data preprocessing techniques, including outlier removal, class balancing for classification tasks using Synthetic Minority Oversampling Technique (SMOTE), and data normalization for regression tasks using Standard Scalar, to remove these uncertainties and clean the data for more accurate predictions. Training classifiers such as XGBoost, Random Forest, Kernel SVM, and Long-Short Term Memory (LSTM) are used for the classification task, while models such as Multiple Linear Regressor, XGBoost, Polynomial Regressor, Random Forest Regressor, and LSTM are used for the regression task. The experiment results show that the proposed approach outperforms several state-of-the-art approaches with an accuracy of 92.2% for the classification task, a mean absolute error of 11.7%, and an R2 score of 76% for the regression task.</p></abstract>
Particle swarm optimization (PSO) algorithm is proposed to deal with text summarization for the Punjabi language. PSO is based on intelligence that predicts among a given set of solutions which is the best solution. The search is carried out by extremely high-speed particles. It updates particle position and velocity at the end of iteration so that during the development of generations, the personal best solution and global best solution are updated. Calculation within PSO is performed using fitness function which looks into various statistical and linguistic features of the Punjabi datasets. Two Punjabi datasets—monolingual Punjabi corpus from Indian Languages Corpora Initiative Phase-II and Punjabi-Hindi parallel corpus—are considered. The parallel corpus comprises 1,000 Punjabi sentences from the tourism domain while monolingual corpus contains 30,000 Punjabi sentences of the general domain. ROUGE measures evaluate summary where the highest measure, ROUGE-1, is achieved for parallel corpus with precision, recall, and F-measure as 0.7836, 0.7957, and 0.7896, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.