Abstract:The outstanding performance of deep learning (DL) for computer vision and natural language processing has fueled increased interest in applying these algorithms more broadly in both research and practice. This study investigates the application of DL techniques to classification of large sparse behavioral data-which has become ubiquitous in the age of big data collection. We report on an extensive search through DL architecture variants and compare the predictive performance of DL with that of carefully regula… Show more
“…Third, a critical point arises when analyzing data produced by unbalanced experimental designs [ 4 , 5 ] where for example, the number of observations per group or condition is not balanced or even more, when some variables could not be recorded for all the individuals per group, so there are missing values (often noted as Not available, “NAs”) or when each feature was measured in different individuals that just have in common being members of the same group [ 6 ].…”
Background
In individuals or animals suffering from genetic or acquired diseases, it is important to identify which clinical or phenotypic variables can be used to discriminate between disease and non-disease states, the response to treatments or sexual dimorphism. However, the data often suffers from low number of samples, high number of variables or unbalanced experimental designs. Moreover, several parameters can be recorded in the same test. Thus, correlations should be assessed, and a more complex statistical framework is necessary for the analysis. Packages already exist that provide analysis tools, but they are not found together, rendering the decision method and implementation difficult for non-statisticians.
Result
We present Gdaphen, a fast joint-pipeline allowing the identification of most important qualitative and quantitative predictor variables to discriminate between genotypes, treatments, or sex. Gdaphen takes as input behavioral/clinical data and uses a Multiple Factor Analysis (MFA) to deal with groups of variables recorded from the same individuals or anonymize genotype-based recordings. Gdaphen uses as optimized input the non-correlated variables with 30% correlation or higher on the MFA-Principal Component Analysis (PCA), increasing the discriminative power and the classifier’s predictive model efficiency. Gdaphen can determine the strongest variables that predict gene dosage effects thanks to the General Linear Model (GLM)-based classifiers or determine the most discriminative not linear distributed variables thanks to Random Forest (RF) implementation. Moreover, Gdaphen provides the efficacy of each classifier and several visualization options to fully understand and support the results as easily readable plots ready to be included in publications. We demonstrate Gdaphen capabilities on several datasets and provide easily followable vignettes.
Conclusions
Gdaphen makes the analysis of phenotypic data much easier for medical or preclinical behavioral researchers, providing an integrated framework to perform: (1) pre-processing steps as data imputation or anonymization; (2) a full statistical assessment to identify which variables are the most important discriminators; and (3) state of the art visualizations ready for publication to support the conclusions of the analyses. Gdaphen is open-source and freely available at https://github.com/munizmom/gdaphen, together with vignettes, documentation for the functions and examples to guide you in each own implementation.
“…Third, a critical point arises when analyzing data produced by unbalanced experimental designs [ 4 , 5 ] where for example, the number of observations per group or condition is not balanced or even more, when some variables could not be recorded for all the individuals per group, so there are missing values (often noted as Not available, “NAs”) or when each feature was measured in different individuals that just have in common being members of the same group [ 6 ].…”
Background
In individuals or animals suffering from genetic or acquired diseases, it is important to identify which clinical or phenotypic variables can be used to discriminate between disease and non-disease states, the response to treatments or sexual dimorphism. However, the data often suffers from low number of samples, high number of variables or unbalanced experimental designs. Moreover, several parameters can be recorded in the same test. Thus, correlations should be assessed, and a more complex statistical framework is necessary for the analysis. Packages already exist that provide analysis tools, but they are not found together, rendering the decision method and implementation difficult for non-statisticians.
Result
We present Gdaphen, a fast joint-pipeline allowing the identification of most important qualitative and quantitative predictor variables to discriminate between genotypes, treatments, or sex. Gdaphen takes as input behavioral/clinical data and uses a Multiple Factor Analysis (MFA) to deal with groups of variables recorded from the same individuals or anonymize genotype-based recordings. Gdaphen uses as optimized input the non-correlated variables with 30% correlation or higher on the MFA-Principal Component Analysis (PCA), increasing the discriminative power and the classifier’s predictive model efficiency. Gdaphen can determine the strongest variables that predict gene dosage effects thanks to the General Linear Model (GLM)-based classifiers or determine the most discriminative not linear distributed variables thanks to Random Forest (RF) implementation. Moreover, Gdaphen provides the efficacy of each classifier and several visualization options to fully understand and support the results as easily readable plots ready to be included in publications. We demonstrate Gdaphen capabilities on several datasets and provide easily followable vignettes.
Conclusions
Gdaphen makes the analysis of phenotypic data much easier for medical or preclinical behavioral researchers, providing an integrated framework to perform: (1) pre-processing steps as data imputation or anonymization; (2) a full statistical assessment to identify which variables are the most important discriminators; and (3) state of the art visualizations ready for publication to support the conclusions of the analyses. Gdaphen is open-source and freely available at https://github.com/munizmom/gdaphen, together with vignettes, documentation for the functions and examples to guide you in each own implementation.
“…CVOA has been used to find the optimal values for the hyperparameters of an LSTM architecture, 9 which is a widely used model for artificial recurrent neural network (RNN), in the field of deep learning. 10 Data from the Spanish electricity consumption have been used to validate the accuracy. The results achieved verge on 0.45%, substantially outperforming other wellestablished methods such as random forest (RF), gradientboost trees (GBT), linear regression (LR), or deep learning optimized with other metaheuristics.…”
This study proposes a novel bioinspired metaheuristic simulating how the coronavirus spreads and infects healthy people. From a primary infected individual (patient zero), the coronavirus rapidly infects new victims, creating large populations of infected people who will either die or spread infection. Relevant terms such as reinfection probability, super-spreading rate, social distancing measures, or traveling rate are introduced into the model to simulate the coronavirus activity as accurately as possible. The infected population initially grows exponentially over time, but taking into consideration social isolation measures, the mortality rate, and number of recoveries, the infected population gradually decreases. The coronavirus optimization algorithm has two major advantages when compared with other similar strategies. First, the input parameters are already set according to the disease statistics, preventing researchers from initializing them with arbitrary values. Second, the approach has the ability to end after several iterations, without setting this value either. Furthermore, a parallel multivirus version is proposed, where several coronavirus strains evolve over time and explore wider search space areas in less iterations. Finally, the metaheuristic has been combined with deep learning models, to find optimal hyperparameters during the training phase. As application case, the problem of electricity load time series forecasting has been addressed, showing quite remarkable performance.
“…Deep learning neural network models are trained or learned to do specific computation. Larger artificial neural networks can be trained with this approach and thus are very useful for larger data sets (Benke & Benke, 2018; De Cnudde, Ramon, Martens, & Provost, 2019; Hey, Butler, Jackson, & Thiyagalingam, 2020). Nowadays, deep learning approach is very popular with researchers working on behavioral and neurophysiological data to tap into representations of neural activity in the brain (Phan, Dou, Piniewski, & Kil, 2016; Vahid, Mückschel, Neuhaus, Stock, & Beste, 2018).…”
Neurotoxicity studies are important in the preclinical stages of drug development process, because exposure to certain compounds that may enter the brain across a permeable blood brain barrier damages neurons and other supporting cells such as astrocytes. This could, in turn, lead to various neurological disorders such as Parkinson's or Huntington's disease as well as various dementias. Toxicity assessment is often done by pathologists after these exposures by qualitatively or semiquantitatively grading the severity of neurotoxicity in histopathology slides. Quantification of the extent of neurotoxicity supports qualitative histopathological analysis and provides a better understanding of the global extent of brain damage. Stereological techniques such as the utilization of an optical fractionator provide an unbiased quantification of the neuronal damage; however, the process is time‐consuming. Advent of whole slide imaging (WSI) introduced digital image analysis which made quantification of neurotoxicity automated, faster and with reduced bias, making statistical comparisons possible. Although automated to a certain level, simple digital image analysis requires manual efforts of experts which is time‐consuming and limits analysis of large datasets. Digital image analysis coupled with a deep learning artificial intelligence model provides a good alternative solution to time‐consuming stereological and simple digital analysis. Deep learning models could be trained to identify damaged or dead neurons in an automated fashion. This review has focused on and discusses studies demonstrating the role of deep learning in segmentation of brain regions, toxicity detection and quantification of degenerated neurons as well as the estimation of area/volume of degeneration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.