Deep Learning on Big, Sparse, Behavioral Data

Cnudde, Sofie De; Ramon, Yanou; Martens, David; Provost, Foster

doi:10.1089/big.2019.0095

Cited by 8 publications

(4 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Third, a critical point arises when analyzing data produced by unbalanced experimental designs [ 4 , 5 ] where for example, the number of observations per group or condition is not balanced or even more, when some variables could not be recorded for all the individuals per group, so there are missing values (often noted as Not available, “NAs”) or when each feature was measured in different individuals that just have in common being members of the same group [ 6 ].…”

Section: Introductionmentioning

confidence: 99%

Gdaphen, R pipeline to identify the most important qualitative and quantitative predictor variables from phenotypic data

2023

View full text Add to dashboard Cite

Background In individuals or animals suffering from genetic or acquired diseases, it is important to identify which clinical or phenotypic variables can be used to discriminate between disease and non-disease states, the response to treatments or sexual dimorphism. However, the data often suffers from low number of samples, high number of variables or unbalanced experimental designs. Moreover, several parameters can be recorded in the same test. Thus, correlations should be assessed, and a more complex statistical framework is necessary for the analysis. Packages already exist that provide analysis tools, but they are not found together, rendering the decision method and implementation difficult for non-statisticians. Result We present Gdaphen, a fast joint-pipeline allowing the identification of most important qualitative and quantitative predictor variables to discriminate between genotypes, treatments, or sex. Gdaphen takes as input behavioral/clinical data and uses a Multiple Factor Analysis (MFA) to deal with groups of variables recorded from the same individuals or anonymize genotype-based recordings. Gdaphen uses as optimized input the non-correlated variables with 30% correlation or higher on the MFA-Principal Component Analysis (PCA), increasing the discriminative power and the classifier’s predictive model efficiency. Gdaphen can determine the strongest variables that predict gene dosage effects thanks to the General Linear Model (GLM)-based classifiers or determine the most discriminative not linear distributed variables thanks to Random Forest (RF) implementation. Moreover, Gdaphen provides the efficacy of each classifier and several visualization options to fully understand and support the results as easily readable plots ready to be included in publications. We demonstrate Gdaphen capabilities on several datasets and provide easily followable vignettes. Conclusions Gdaphen makes the analysis of phenotypic data much easier for medical or preclinical behavioral researchers, providing an integrated framework to perform: (1) pre-processing steps as data imputation or anonymization; (2) a full statistical assessment to identify which variables are the most important discriminators; and (3) state of the art visualizations ready for publication to support the conclusions of the analyses. Gdaphen is open-source and freely available at https://github.com/munizmom/gdaphen, together with vignettes, documentation for the functions and examples to guide you in each own implementation.

show abstract

Section: Introductionmentioning

confidence: 99%

Gdaphen, R pipeline to identify the most important qualitative and quantitative predictor variables from phenotypic data

2023

View full text Add to dashboard Cite

show abstract

“…CVOA has been used to find the optimal values for the hyperparameters of an LSTM architecture, 9 which is a widely used model for artificial recurrent neural network (RNN), in the field of deep learning. 10 Data from the Spanish electricity consumption have been used to validate the accuracy. The results achieved verge on 0.45%, substantially outperforming other wellestablished methods such as random forest (RF), gradientboost trees (GBT), linear regression (LR), or deep learning optimized with other metaheuristics.…”

Section: Introductionmentioning

confidence: 99%

Coronavirus Optimization Algorithm: A Bioinspired Metaheuristic Based on the COVID-19 Propagation Model

et al. 2020

View full text Add to dashboard Cite

This study proposes a novel bioinspired metaheuristic simulating how the coronavirus spreads and infects healthy people. From a primary infected individual (patient zero), the coronavirus rapidly infects new victims, creating large populations of infected people who will either die or spread infection. Relevant terms such as reinfection probability, super-spreading rate, social distancing measures, or traveling rate are introduced into the model to simulate the coronavirus activity as accurately as possible. The infected population initially grows exponentially over time, but taking into consideration social isolation measures, the mortality rate, and number of recoveries, the infected population gradually decreases. The coronavirus optimization algorithm has two major advantages when compared with other similar strategies. First, the input parameters are already set according to the disease statistics, preventing researchers from initializing them with arbitrary values. Second, the approach has the ability to end after several iterations, without setting this value either. Furthermore, a parallel multivirus version is proposed, where several coronavirus strains evolve over time and explore wider search space areas in less iterations. Finally, the metaheuristic has been combined with deep learning models, to find optimal hyperparameters during the training phase. As application case, the problem of electricity load time series forecasting has been addressed, showing quite remarkable performance.

show abstract

“…Deep learning neural network models are trained or learned to do specific computation. Larger artificial neural networks can be trained with this approach and thus are very useful for larger data sets (Benke & Benke, 2018; De Cnudde, Ramon, Martens, & Provost, 2019; Hey, Butler, Jackson, & Thiyagalingam, 2020). Nowadays, deep learning approach is very popular with researchers working on behavioral and neurophysiological data to tap into representations of neural activity in the brain (Phan, Dou, Piniewski, & Kil, 2016; Vahid, Mückschel, Neuhaus, Stock, & Beste, 2018).…”

Section: Introductionmentioning

confidence: 99%

Quantitative neurotoxicology: Potential role of artificial intelligence/deep learning approach

Srivastava

Hanig

2020

J of Applied Toxicology

View full text Add to dashboard Cite

Neurotoxicity studies are important in the preclinical stages of drug development process, because exposure to certain compounds that may enter the brain across a permeable blood brain barrier damages neurons and other supporting cells such as astrocytes. This could, in turn, lead to various neurological disorders such as Parkinson's or Huntington's disease as well as various dementias. Toxicity assessment is often done by pathologists after these exposures by qualitatively or semiquantitatively grading the severity of neurotoxicity in histopathology slides. Quantification of the extent of neurotoxicity supports qualitative histopathological analysis and provides a better understanding of the global extent of brain damage. Stereological techniques such as the utilization of an optical fractionator provide an unbiased quantification of the neuronal damage; however, the process is time‐consuming. Advent of whole slide imaging (WSI) introduced digital image analysis which made quantification of neurotoxicity automated, faster and with reduced bias, making statistical comparisons possible. Although automated to a certain level, simple digital image analysis requires manual efforts of experts which is time‐consuming and limits analysis of large datasets. Digital image analysis coupled with a deep learning artificial intelligence model provides a good alternative solution to time‐consuming stereological and simple digital analysis. Deep learning models could be trained to identify damaged or dead neurons in an automated fashion. This review has focused on and discusses studies demonstrating the role of deep learning in segmentation of brain regions, toxicity detection and quantification of degenerated neurons as well as the estimation of area/volume of degeneration.

show abstract

Deep Learning on Big, Sparse, Behavioral Data

Cited by 8 publications

References 36 publications

Gdaphen, R pipeline to identify the most important qualitative and quantitative predictor variables from phenotypic data

Gdaphen, R pipeline to identify the most important qualitative and quantitative predictor variables from phenotypic data

Coronavirus Optimization Algorithm: A Bioinspired Metaheuristic Based on the COVID-19 Propagation Model

Quantitative neurotoxicology: Potential role of artificial intelligence/deep learning approach

Contact Info

Product

Resources

About