2020
DOI: 10.1002/cpe.5814
|View full text |Cite
|
Sign up to set email alerts
|

IPDS: A semantic mediator‐based system using Spark for the integration of heterogeneous proteomics data sources

Abstract: Summary With the constant rise of data volumes in many disciplines, various new Big data management systems have emerged to provide scalable tools for efficient data integration, processing, and analysis. In this article, we provide an overview of biomedical data integration systems focusing on ontology‐based semantic systems and Big data technologies based systems such as Apache Spark. We also propose a new semantic data integration system, called Integrated Proteomics Data System (IPDS), which uses a mediato… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 80 publications
(96 reference statements)
0
5
0
Order By: Relevance
“…The PRIDE database is available with the aim to archive various types of proteomics mass spectrometry data for reproducible research, facilitate protein-centric integration of MS-based data for variant and modification analysis, and furnish MS-based expression data to the Expression Atlas [ 10 ]. An integrated proteomics data system (IPDS), a data integration platform, is developed to collect the expanding heterogeneous proteomic data and its relevant information and to make this information easily accessible to the scientific community [ 6 ]. Despite all these and other databases, to our best knowledge, there is no publicly available dataset/repository dedicated to a biomedical data integration system that is curated especially for a machine learning point of view, where all MS files are mapped to its respective patient medical history without personal information disclosure.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…The PRIDE database is available with the aim to archive various types of proteomics mass spectrometry data for reproducible research, facilitate protein-centric integration of MS-based data for variant and modification analysis, and furnish MS-based expression data to the Expression Atlas [ 10 ]. An integrated proteomics data system (IPDS), a data integration platform, is developed to collect the expanding heterogeneous proteomic data and its relevant information and to make this information easily accessible to the scientific community [ 6 ]. Despite all these and other databases, to our best knowledge, there is no publicly available dataset/repository dedicated to a biomedical data integration system that is curated especially for a machine learning point of view, where all MS files are mapped to its respective patient medical history without personal information disclosure.…”
Section: Discussionmentioning
confidence: 99%
“…By utilizing a shared query engine and, in some circumstances, a data processing framework like Spark, these systems provide consolidated access to multiple data stores, such as the Relational Database Management System (RDBMS), NoSQL, and Hadoop Distributed File System (HDFS). In a nutshell, multistore systems enable seamless access to many types of data sources through the use of a common querying and processing approach, making it easier to analyze and extract insights from data stored in different forms and locations [ 6 ].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The fifth paper, entitled “IPDS: A semantic mediator‐based system using Spark for the integration of heterogeneous proteomics data sources” by Messaoudi et al 5 . focuses on facilitating the ingestion of big data, generated on various systems and platforms.…”
Section: Introductionmentioning
confidence: 99%
“…The fifth paper, entitled "IPDS: A semantic mediator-based system using Spark for the integration of heterogeneous proteomics data sources" by Messaoudi et al 5 focuses on facilitating the ingestion of big data, generated on various systems and platforms. It presents an overview of biomedical data integration systems focusing on ontology-based semantic systems and big data technologies based systems such as Apache Spark and proposes a semantic data integration system, called Integrated Proteomics Data System (IPDS), which uses a mediator approach.…”
mentioning
confidence: 99%