A survey on pre-processing techniques: Relevant issues in the context of environmental data mining

Gibert, Karina; Sànchez–Marrè, Miquel; Izquierdo, Joaquín

doi:10.3233/aic-160710

Cited by 51 publications

(34 citation statements)

References 175 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Databases from each centre are harmonized into a single data base by applying the data-cleaning pre-processing techniques. Descriptive statistics and data visualisation methods are used in order to detect outliers, data errors, missing data and influential observations [12]. A double-checking process correcting errors and completing missing information is carried out to minimize incomplete and erroneous data.…”

Section: Methodsmentioning

confidence: 99%

Childhood-onset of primary Sjogren syndrome Phenotypic characterization at diagnosis of 158 children

Ramos‐Casals

Acar-Denizli

Vissink

et al. 2020

Preprint

View full text Add to dashboard Cite

OBJECTIVES. To characterize the phenotypic presentation at diagnosis of childhood-onset primary Sjogren syndrome (SjS). METHODS. The Big Data Sjogren Project Consortium is an international, multicentre registry using worldwide data-sharing cooperative merging of pre-existing clinical SjS databases from the five continents. For this study, we selected those patients in whom the disease was diagnosed below the age of 19 according to the fulfilment of the 2002/2016 classification criteria. RESULTS. Among the 12,083 patients included in the Sjogren Big Data Registry, 158 (1.3%) patients had a childhood-onset diagnosis (136 girls, mean age of 14.2 years): 126 (80%) reported dry mouth, 111 (70%) dry eyes, 52 (33%) parotid enlargement, 118/122 (97%) positive minor salivary gland biopsy and 60/64 (94%) abnormal salivary US study, 140/155 (90%) positive ANA, 138/156 (89%) anti-Ro/La antibodies and 86/142 (68%) positive RF. The systemic ESSDAI domains containing the highest frequencies of active patients included the glandular (47%), articular (26%) and lymphadenopathy (25%) domains. Patients with childhood-onset primary SjS showed the highest mean ESSDAI score and the highest frequencies of systemic disease in 5 (constitutional, lymphadenopathy, glandular, cutaneous and haematological) of the 12 ESSDAI domains, and the lowest frequencies in 4 (articular, pulmonary, peripheral nerve and central nervous system) in comparison with patients with adult-onset disease. CONCLUSIONS. Childhood-onset primary SjS involves around 1% of patients with primary SjS, with a clinical phenotype dominated by sicca features, parotid enlargement and systemic disease. Age at diagnosis plays a key role on modulating the phenotypic expression of the disease.

show abstract

Section: Methodsmentioning

confidence: 99%

Childhood-onset of primary Sjogren syndrome Phenotypic characterization at diagnosis of 158 children

Ramos‐Casals

Acar-Denizli

Vissink

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Such iterative and explorative nature of the modeling process is commonly tedious and time-consuming. Moreover, the quality of the ML results is also dependent of data and feature engineering aspects (e.g., feature selection, outlier detection) (Domingos, 2012) that are typically performed on the Data Understanding and Data Preparation CRISP-DM stages (Gibert et al, 2016).…”

Section: Crisp-dm and Automlmentioning

confidence: 99%

Predicting the Tear Strength of Woven Fabrics Via Automated Machine Learning: An Application of the CRISP-DM Methodology

Ribeiro

Pilastri²,

Moura

et al. 2020

Proceedings of the 22nd International Conference on Enterprise Information Systems

View full text Add to dashboard Cite

Textile and clothing is an important world industry that is currently being transformed by the adoption of the Industry 4.0 concept. In this paper, we use Data Mining (DM) technology and the CRoss-Industry Standard Process for DM (CRISP-DM) methodology to model the textile testing process, which assures that products are safe and comply with regulations and client needs. Real-world data were collected from a Portuguese textile company, which has the goal to reduce the number of attempts they take in order to produce a woven fabric. Thus, predicting the outcome of a given test is beneficial to the company because it can reduce the number of physical samples that are needed to be produced when designing new fabrics. In particular, we target two important textile regression tasks: the tear strength in warp and weft directions. To better focus on feature engineering and data transformations, we adopt an Automated Machine Learning (AutoML) during the modeling stage of the CRISP-DM. Several iterations of the CRISP-DM methodology were employed, using different data preprocessing procedures (e.g., removal of outliers). The best predictive models were achieved after 2 (for warp) and 3 (for weft) CRISP-DM iterations.

show abstract

“…The correctness of the predictions and their reservations developed by the ML algorithms depend on the data quality, model representativeness and the reliance between the input and target variables in the collected datasets [14,15]. Data with high level of noise, erroneous data, presence of outliers, biases and incomplete datasets may significantly reduce the predictive efficiency of the models [16,17]. To overcome the issues, this research paper designed a DCRN model to predict the crop yield by using rainfall parameter.…”

Section: Introductionmentioning

confidence: 99%

Analysis and Prediction of Crop Production in Andhra Region using Deep Convolutional Regression Network

Talasila¹,

Prasad²,

Reddy³

et al. 2020

IJIES

View full text Add to dashboard Cite

Agriculture planning plays a significant role in economic growth and the food security of agro-based country. Crop yield prediction and selection of crops are the most challenging tasks in agricultural domain and it depends on different parameters such as production rate, market price and government policies. Among the two primary tasks, the crop yield prediction is one of the most demanding tasks for every nation. Due to uncertain climatic changes, farmers are struggling to attain a satisfactory amount of yield from the crops. Many researchers have studied on the prediction of weather, prediction of yield rate of crop, crop classification and soil classification for agriculture planning using statistical methods or machine learning techniques. This study focuses on the prediction of major crops in Andhra Pradesh region and presents an enhanced algorithm known as Deep Convolutional Regression Network (DCRN), which is trained and tested on agricultural data collected from farmers. The experimental results showed that the DCRN method achieved nearly 97% prediction accuracy when compared with existing methods like Decision Tree (DT), Self-Organizing Map (SOM).

show abstract

A survey on pre-processing techniques: Relevant issues in the context of environmental data mining

Cited by 51 publications

References 175 publications

Childhood-onset of primary Sjogren syndrome Phenotypic characterization at diagnosis of 158 children

Childhood-onset of primary Sjogren syndrome Phenotypic characterization at diagnosis of 158 children

Predicting the Tear Strength of Woven Fabrics Via Automated Machine Learning: An Application of the CRISP-DM Methodology

Analysis and Prediction of Crop Production in Andhra Region using Deep Convolutional Regression Network

Contact Info

Product

Resources

About