Instance Reduction for Avoiding Overfitting in Decision Trees

Amro, Asma'; Al-Akhras, Mousa; Hindi, Khalil El; Habib, Mohamed; Shawar, Bayan Abu

doi:10.1515/jisys-2020-0061

Cited by 24 publications

(16 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Given the model’s architecture, if the model is allowed to be trained to its full power, the model is practically guaranteed to overfit the training data. Fortunately, overfitting in machine learning algorithms may be avoided and prevented using a number of different methods [ 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 ]. Some methods that are frequently employed to prevent overfitting in decision trees are as follows: Pre-pruning.…”

Section: Resultsmentioning

confidence: 99%

Prediction of Work-Related Risk Factors among Bus Drivers Using Machine Learning

Hanumegowda

Sakthivel

2022

IJERPH

View full text Add to dashboard Cite

A recent development in ergonomics research is using machine learning techniques for risk assessment and injury prevention. Bus drivers are more likely than other workers to suffer musculoskeletal diseases because of the nature of their jobs and their working conditions (WMSDs). The basic idea of this study is to forecast important work-related risk variables linked to WMSDs in bus drivers using machine learning approaches. A total of 400 full-time male bus drivers from the east and west zone depots of Bengaluru Metropolitan Transport Corporation (BMTC), which is based in Bengaluru, south India, took part in this study. In total, 92.5% of participants responded to the questionnaire. The Modified Nordic Musculoskeletal Questionnaire was used to gather data on symptoms of WMSD during the past 12 months (MNMQ). Machine learning techniques including decision tree, random forest, and naïve Bayes were used to forecast the important risk factors related to WMSDs. It was discovered that WMSDs and work-related characteristics were statistically significant. In total, 66.75% of subjects reported having WMSDs. Various classifiers were used to derive the simulation results for the frequency of pain in the musculoskeletal systems throughout the last 12 months with the important risk variables. With 100% accuracy, decision tree and random forest algorithms produce the same results. Naïve Bayes yields 93.28% accuracy. In this study, through a questionnaire survey and data analysis, several health and work-related risk factors were identified among the bus drivers. Risk factors such as involvement in physical activities, frequent posture change, exposure to vibration, egress ingress, on-duty breaks, and seat adaptability issues have the highest influence on the frequency of pain due to WMSDs among bus drivers. From this study, it is recommended that drivers get involved in physical activities, adopt a healthy lifestyle, and maintain proper posture while driving. For any transport organization/company, it is recommended to design driver cabins ergonomically to mitigate the WMSDs among bus drivers.

show abstract

Section: Resultsmentioning

confidence: 99%

Prediction of Work-Related Risk Factors among Bus Drivers Using Machine Learning

Hanumegowda

Sakthivel

2022

IJERPH

View full text Add to dashboard Cite

show abstract

“…Though a commonly used tool in data mining for deriving a strategy to reach a particular goal, it is also widely used in machine learning (Dietterich and Kong, 1995; Navada et al , 2011; Somvanshi et al , 2016). A disadvantage of a decision tree model is that the likelihood of overfitting to the data tends to increase as the size and complexity of the tree grows (Al-Akhras et al , 2021). The decision tree model, however, is advantageous, in the sense that it can be used for both classification and regression problems.…”

Section: Methodsmentioning

confidence: 99%

“…The two decision tree regression models (with and without lagged variables) give the same prediction of housing prices for several different actual values, which compromises the accuracy of this method. As the likelihood of overfitting to the data positively correlates with an increase in the size and complexity of the decision tree (Al-Akhras et al , 2021), decision trees can be prone to overfitting, which can undermine model validity. As such, it is found that random forest regression is a better choice in housing pricing modelling.…”

Section: Housing Price Modellingmentioning

confidence: 99%

Longitudinal modelling of housing prices with machine learning and temporal regression

Rahman

Miller

2022

IJHMA

View full text Add to dashboard Cite

Purpose The purpose of this paper is to model housing price temporal variations and to predict price trends within the context of land use–transportation interactions using machine learning methods based on longitudinal observation of housing transaction prices. Design/methodology/approach This paper examines three machine learning algorithms (linear regression machine learning (ML), random forest and decision trees) applied to housing price trends from 2001 to 2016 in the Greater Toronto and Hamilton Area, with particular interests in the role of accessibility in modelling housing price. It compares the performance of the ML algorithms with traditional temporal lagged regression models. Findings The empirical results show that the ML algorithms achieve good accuracy (R2 of 0.873 after cross-validation), and the temporal regression produces competitive results (R2 of 0.876). Temporal lag effects are found to play a key role in housing price modelling, along with physical conditions and socio-economic factors. Differences in accessibility effects on housing prices differ by mode and activity type. Originality/value Housing prices have been extensively modelled through hedonic-based spatio-temporal regression and ML approaches. However, the mutually dependent relationship between transportation and land use makes price determination a complex process, and the comparison of different longitudinal analysis methods is rarely considered. The finding presents the longitudinal dynamics of housing market variation to housing planners.

show abstract

“…Such an algorithm relies on the training data quality and its accuracy decreases around decision boundaries. Noisy instances, a small number of training examples and over-learning are some of the main reasons that could lead to poor performance [39] and overfitting.…”

Section: Decision Treesmentioning

confidence: 99%

Frequency-Domain sEMG Classification Using a Single Sensor

Stefanou

Guiraud

Fattal

et al. 2022

Sensors

View full text Add to dashboard Cite

Working towards the development of robust motion recognition systems for assistive technology control, the widespread approach has been to use a plethora of, often times, multi-modal sensors. In this paper, we develop single-sensor motion recognition systems. Utilising the peripheral nature of surface electromyography (sEMG) data acquisition, we optimise the information extracted from sEMG sensors. This allows the reduction in sEMG sensors or provision of contingencies in a system with redundancies. In particular, we process the sEMG readings captured at the trapezius descendens and platysma muscles. We demonstrate that sEMG readings captured at one muscle contain distinct information on movements or contractions of other agonists. We used the trapezius and platysma muscle sEMG data captured in able-bodied participants and participants with tetraplegia to classify shoulder movements and platysma contractions using white-box supervised learning algorithms. Using the trapezius sensor, shoulder raise is classified with an accuracy of 99%. Implementing subject-specific multi-class classification, shoulder raise, shoulder forward and shoulder backward are classified with a 94% accuracy amongst object raise and shoulder raise-and-hold data in able bodied adults. A three-way classification of the platysma sensor data captured with participants with tetraplegia achieves a 95% accuracy on platysma contraction and shoulder raise detection.

show abstract

Instance Reduction for Avoiding Overfitting in Decision Trees

Cited by 24 publications

References 19 publications

Prediction of Work-Related Risk Factors among Bus Drivers Using Machine Learning

Prediction of Work-Related Risk Factors among Bus Drivers Using Machine Learning

Longitudinal modelling of housing prices with machine learning and temporal regression

Frequency-Domain sEMG Classification Using a Single Sensor

Contact Info

Product

Resources

About