Background: The recent advances in biotechnology and computer science have led to an ever-increasing availability of public biomedical data distributed in large databases worldwide. However, these data collections are far from being “big” enough and “standardized” so to be integrated, making impossible to fully exploit latest machine learning technologies for the analysis of data themselves. Hence, facing this huge flow of biomedical data is a challenging task for researchers and clinicians due to their complexity and high heterogeneity. An effective strategy to address this issue could be the building of a formal conceptual model, which in general allows the design of semantic tools to collect and explore data for a given pathology. This is the case of neurodegenerative diseases and the Alzheimer Disease (AD), in particular. The last years have witnessed the creation of specialized data collections such as the one maintained by the Alzheimer’s Disease Neuroimaging Initiative (ADNI), which contains the largest number of biomedical concepts in the field of AD. For this class of diseases Big Data and Deep Learning give hope for the discovery of new biomarkers for early diagnosis. Hence, a new way must be undertaken of collecting and managing biomedical data.
Results: We developed a detailed ontology for clinical multidimensional datasets from the ADNI repository in order to simplify the data access, to obtain new diagnostic knowledge about Alzheimer’s disease and ease the task of harmonization.
Conclusions: The semantic data base, populated by ADNI data will suggest new queries allowing machine learning techniques to be applied to any possible combination of data set. The conceptual model could be adopted by any center, so giving rise to new databases that, if made public available, will simplify data integration and multi center data collection projects.