Algorithmic Splitting: A Method for Dataset Preparation

Kahloot, Khalid M.; Ekler, Péter

doi:10.1109/access.2021.3110745

Cited by 31 publications

(12 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, relying on a single dataset split to create a model may pose challenges to the model’s representativeness and reliability. The performance of models established on small datasets would be obviously impacted by dataset size, split ratio, and split strategy, − and machine learning may not capture the full range of features and patterns present in the given training set. This underscores the importance of evaluating model performance and interpreting what a model learned across various splits.…”

Section: Resultsmentioning

confidence: 99%

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Yu,

Tang,

et al. 2023

Environ. Sci. Technol.

View full text Add to dashboard Cite

Understanding plant uptake and translocation of nanomaterials is crucial for ensuring the successful and sustainable applications of seed nanotreatment. Here, we collect a dataset with 280 instances from experiments for predicting the relative metal/ metalloid concentration (RMC) in maize seedlings after seed priming by various metal and metalloid oxide nanoparticles. To obtain unbiased predictions and explanations on small datasets, we present an averaging strategy and add a dimension for interpretable machine learning. The findings in post-hoc interpretations of sophisticated LightGBM models demonstrate that solubility is highly correlated with model performance. Surface area, concentration, zeta potential, and hydrodynamic diameter of nanoparticles and seedling part and relative weight of plants are dominant factors affecting RMC, and their effects and interactions are explained. Furthermore, self-interpretable models using the RuleFit algorithm are established to successfully predict RMC only based on six important features identified by post-hoc explanations. We then develop a visualization tool called RuleGrid to depict feature effects and interactions in numerous generated rules. Consistent parameter-RMC relationships are obtained by different methods. This study offers a promising interpretable data-driven approach to expand the knowledge of nanoparticle fate in plants and may profoundly contribute to the safety-by-design of nanomaterials in agricultural and environmental applications.

show abstract

Section: Resultsmentioning

confidence: 99%

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Yu,

Tang,

et al. 2023

Environ. Sci. Technol.

View full text Add to dashboard Cite

show abstract

“…The practice of dimensionality reduction followed by clustering is common for large input data and has been applied to SAR data sets (Van de Kerkhof et al., 2020), and for a wide range of other data types (Fernández Llamas et al., 2019; R. Harrison et al., 2019; Kahloot & Ekler, 2019). T‐SNE is a dimensionality reduction method that can group similarly behaving time series of height measurements of the different reflection points (Van der Maaten & Hinton, 2008).…”

Section: Methodsmentioning

confidence: 99%

Disentangling Shallow Subsidence Sources by Data Assimilation in a Reclaimed Urbanized Coastal Plain, South Flevoland Polder, the Netherlands

et al. 2023

View full text Add to dashboard Cite

Over half a billion people live in coastal plains and deltas threatened by anthropogenically induced subsidence, and this number is expected to increase in the foreseeable future (Neumann et al., 2015;Schmidt, 2015). Many anthropogenic subsurface activities in coastal areas and delta plains result in subsidence, thereby amplifying relative sea-level rise and flood risks, inflicting damage to infrastructure, and overall, reducing the viability of these low-lying areas (

show abstract

“…To determine the quality of the trained model, we use the held-out validation datasets. You may learn a lot about the model's performance from its validation [40]. Absolute error and mean absolute error are two metrics used to measure the quality of the model, which allows for testing and comparison of multiple models [41,42].…”

Section: Modelmentioning

confidence: 99%

MLOps critical success factors - A systematic literature review

Mehmood,

Sabahat,

Muhammad Arsal Ijaz

2024

VFAST trans. softw. eng.

View full text Add to dashboard Cite

MLOps encompasses a collection of practices integrating machine learning into operational activities, a recent addition to the diverse array of machine learning process models. The need to tightly integrate machine learning with information systems operations to ensure organizational performance led to the development of this approach. Therefore, MLOps methodologies are useful for businesses that want to make their ML operations and procedures more efficient. The purpose of this study is to summarize the many critical success factors that have been identified in studies focusing on MLOps initiatives. The paper shows how these CSFs affect MLOps performance and what factors drive this influence. We picked primary papers for analysis after conducting searches in three major publishing databases. We narrowed the field down to 58 unique CSFs, which were then classified according to three dimensions: technical, organizational, social and cultural. These CSFs affect and drive performance in MLOps, based on the results of the literature review. Researchers and industrial experts may enhance their understanding of CSFs and get insights into tackling MLOps difficulties inside organizations. The paper, notably, emphasizes several prospective research directions linked to CSFs.

show abstract

Algorithmic Splitting: A Method for Dataset Preparation

Cited by 31 publications

References 16 publications

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Averaging Strategy for Interpretable Machine Learning on Small Datasets to Understand Element Uptake after Seed Nanotreatment

Disentangling Shallow Subsidence Sources by Data Assimilation in a Reclaimed Urbanized Coastal Plain, South Flevoland Polder, the Netherlands

MLOps critical success factors - A systematic literature review

Contact Info

Product

Resources

About