2014
DOI: 10.1186/1471-2164-15-s8-s3
|View full text |Cite
|
Sign up to set email alerts
|

High dimensional biological data retrieval optimization with NoSQL technology

Abstract: BackgroundHigh-throughput transcriptomic data generated by microarray experiments is the most abundant and frequently stored kind of data currently used in translational medicine studies. Although microarray data is supported in data warehouses such as tranSMART, when querying relational databases for hundreds of different patient gene expression records queries are slow due to poor performance. Non-relational data models, such as the key-value model implemented in NoSQL databases, hold promise to be more perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0
1

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(22 citation statements)
references
References 17 publications
0
21
0
1
Order By: Relevance
“…It's processing and storage methods must adhere to data privacy at a high level and also the accessibility of the data for public disclosure [82][83][84]. One method of ensuring that patient data privacy/security is to use indexes generated from HBase, which can securely encrypt KV stores [36,[85][86][87], and HBase can further encrypt with integration with Hive [35]. Scott [62] also stated that Drill is already setup for encryption for HIPPA but we did not find this out-of-the-box and attempting to encrypt was time consuming.…”
Section: Discussionmentioning
confidence: 99%
“…It's processing and storage methods must adhere to data privacy at a high level and also the accessibility of the data for public disclosure [82][83][84]. One method of ensuring that patient data privacy/security is to use indexes generated from HBase, which can securely encrypt KV stores [36,[85][86][87], and HBase can further encrypt with integration with Hive [35]. Scott [62] also stated that Drill is already setup for encryption for HIPPA but we did not find this out-of-the-box and attempting to encrypt was time consuming.…”
Section: Discussionmentioning
confidence: 99%
“…Several cloud-based systems are concerned with the management of secondary analysis pipelines [2], [21] and several cloud-based libraries provide the methods for those pipelines [35] [31] [46]. Effective metadata management for selecting samples using key-based NoSQL storage for referring to genomic datasets is described in [50]. We next concentrate on region processing, the most critical aspect of ternary data management, and specifically with the technology used for massive region-based processing, with the libraries supporting low-level region operations and with high-level query languages.…”
Section: Related Workmentioning
confidence: 99%
“…7,11,12 Others describe their usage in studies 2,3,[13][14][15] or in connection with other platforms or technologies. [16][17][18] This indicates a substantial need for further research.…”
Section: Motivation and Related Workmentioning
confidence: 99%