Special issue on feature engineering editorial

Verdonck, Tim; Baesens, Bart; Óskarsdóttir, María; Broucke, Seppe vanden

doi:10.1007/s10994-021-06042-2

Cited by 46 publications

(30 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We defined a feature set in the time-domain, frequencydomain, as well as time-frequency domain (these are waveletrelated variables) in a process called feature engineering (Verdonck et al, 2021). In addition to standard statistical features, we used number and energy content of EDA events and storms, as they are known to differ for different sleep stages (Sano et al, 2014) and OSA severity (Arnardottir et al, 2010).…”

Section: Feature Extraction and Selectionmentioning

confidence: 99%

Using the electrodermal activity signal and machine learning for diagnosing sleep

Piccini

August²,

Óskarsdóttir³

et al. 2023

Front. Sleep

Self Cite

View full text Add to dashboard Cite

IntroductionThe use of the electrodermal activity (EDA) signal for health diagnostics is becoming increasingly popular. The increase is due to advances in computational methods such as machine learning (ML) and the availability of wearable devices capable of better measuring EDA signals. One field where work on EDA has significantly increased is sleep research, as changes in EDA are related to different aspects of sleep and sleep health such as sleep stages and sleep-disordered breathing; for example, obstructive sleep apnoea (OSA).MethodsIn this work, we used supervised machine learning, particularly the extreme gradient boosting (XGBoost) algorithm, to develop models for detecting sleep stages and OSA. We considered clinical knowledge of EDA during particular sleep stages and OSA occurrences, complementing a standard statistical feature set with EDA-specific variables.ResultsWe obtained an average macro F1-score of 57.5% and 66.6%, depending on whether we considered five or four sleep stages, respectively. When detecting OSA, regardless of the severity, the model reached an accuracy of 83.7% or 78.4%, depending on the measure used to classify the participant's sleep health status.ConclusionThe research work presented here provides further evidence that, in the future, most sleep health diagnostics might well do without complete polysomnography (PSG) studies, as wearables can detect well the EDA signal.

show abstract

Section: Feature Extraction and Selectionmentioning

confidence: 99%

Using the electrodermal activity signal and machine learning for diagnosing sleep

Piccini

August²,

Óskarsdóttir³

et al. 2023

Front. Sleep

Self Cite

View full text Add to dashboard Cite

show abstract

“…In other contexts, such as materials informatics and natural language processing, it has been shown that the quality of the encoding is critical to the performance of machine learning models. [74][75][76][77][78] Here, we focus on three methods of featurization: sequence vectors, token counting, and implicit feature learning.…”

Section: Supervised Learningmentioning

confidence: 99%

Predicting aggregate morphology of sequence-defined macromolecules with Recurrent Neural Networks

Bhattacharya¹,

Kleeblatt²,

Statt³

et al. 2022

Preprint

View full text Add to dashboard Cite

Self-assembly of dilute sequence-defined macromolecules is a complex phenomenon in which the local arrangement of chemical moieties leads to the formation of a long-range structure. The dependence of this structure on the sequence necessarily implies that a mapping between the two exists, yet it has been difficult to model so far. Predicting the aggregation behavior of these macromolecules is challenging due to the lack of effective order parameters, a vast design space, inherent variability, and high computational costs associated with currently available simulation techniques. Here, we accurately predict the morphology of aggregates self-assembled from sequence-defined macromolecules using supervised machine learning. We find that regression models with implicit representation learning perform significantly better than those based on engineered features such as k-mer counting, and a Recurrent-Neural-Network-based regressor performs the best. Further, we demonstrate the high-throughput screening of monomer sequences using the regression model to identify candidates for self-assembly into selected morphologies. Our strategy is shown to successfully identify multiple suitable sequences in every test we performed, so we hope the insights gained here can be extended to other increasingly complex design scenarios in the future, such as the design of sequences under polydispersity and at varying environmental conditions.

show abstract

“…From a machine learning perspective, all used explanatory variables represent "handcrafted" features whose selection is based on domain or expert knowledge [85]. This mainly concerns the determination of multi-scale tuning parameters (cf., Table 1) and scale levels (see Section 2.1.4).…”

Section: Scale-specific Optimizationmentioning

confidence: 99%

Scale-specific Prediction of Topsoil Organic Carbon Contents using Terrain Attributes and SCMaP Soil Reflectance Composites

Möller¹,

Zepp²,

Wiesmeier³

et al. 2022

Preprint

View full text Add to dashboard Cite

There is a growing need for an area-wide knowledge of SOC contents in agricultural soils at field scale for food security, monitoring long-term changes related to soil health and climate change. In Germany, large-scale SOC maps are mostly available with a spatial resolution of 250 m to 1 km2. The nationwide availability of both digital elevation models at various spatial resolutions and multi-temporal satellite imagery enables the derivation of multi-scale terrain attributes and Landsat-based multi-temporal soil reflectance composites (SRC) as explanatory variables. On the example of an Bavarian test of about 8000 km2, the scale-specific dependencies between the representativeness of 220 soil samples and different aggregation levels of the explanatory variables were analyzed for their scale-specific predictive power. The aggregation levels were generated by applying a region-growing segmentation procedure, the SOC content prediction was realized by the Random Forest algorithm. In doing so, established approaches of (geographic) object-based image analysis (GEOBIA) and machine learning were combined. The modeling results revealed scale-specific differences. Compared to terrain attributes, the use of SRC parameters lead to a significant model improvement at large field-related scale levels. The joint use of both terrain attributes and SRC parameters resulted in further model improvements. The best modeling variant is characterized by an accuracy of R2=0.84 and RMSE=1.99.

show abstract

Special issue on feature engineering editorial

Cited by 46 publications

References 34 publications

Using the electrodermal activity signal and machine learning for diagnosing sleep

Using the electrodermal activity signal and machine learning for diagnosing sleep

Predicting aggregate morphology of sequence-defined macromolecules with Recurrent Neural Networks

Scale-specific Prediction of Topsoil Organic Carbon Contents using Terrain Attributes and SCMaP Soil Reflectance Composites

Contact Info

Product

Resources

About