MIX-RS: A Multi-Indexing System Based on HDFS for Remote Sensing Data Storage

Wu, Jiashu; Xiong, Jingpan; Dai, Hao; Wang, Yan; Xu, Cheng‐Zhong

doi:10.26599/tst.2021.9010082

Cited by 13 publications

(4 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Studies on the storage and processing of large datasets have been conducted in various fields. Examples include research on storing large-scale meteorological data [9], storing remotely sensed data that are produced in large quantities [10], using Hadoop for storing resource description framework models for linked data and knowledge graphs [11,12], and research on using Hadoop for processing medical data to predict chronic kidney diseases in the context of large-scale bioscience data [13]. Other studies have explored the advantages of Hadoop in storing and searching proteomic datasets [14], as well as storing largescale genomic data in FASTA/Q files [15].…”

Section: Related Workmentioning

confidence: 99%

Data Lake Conceptualized Web Platform for Food Research Data Collection

An,

Oh,

Kim

et al. 2024

JWE

View full text Add to dashboard Cite

Food research is uniquely intertwined with everyday life and necessitates the utilization of big data. Within this domain, the research data consist of various forms and formats, encompassing biological experiment results, chemical analysis data, nutritional information, microbiological data, sensor data, images, and videos. This diversity stems from the integration of data from various subdomains within the larger field. With recent advancements in deep learning technology, the importance of data has grown significantly, resulting in increased reliance on data-driven research. Although specialized platforms for sharing and utilizing data have been established at the national level, particularly in the bioscience field, food research lacks a dedicated infrastructure and specialized data-sharing platforms. In this study, we develop a platform that leverages Hadoop-based distributed file systems to create a data lake. This platform enables data storage and sharing through a web-based interface. The distributed file system supports scalability by adding data nodes, making it an effective solution for capacity expansion. In addition, the web-based platform ensures high accessibility, allowing users access from anywhere, at any time, using any device. Finally, we introduce the establishment of a 1.8 PB Hadoop-based physical storage system and present an approach for building a highly accessible web platform with substantial utility.

show abstract

Section: Related Workmentioning

confidence: 99%

Data Lake Conceptualized Web Platform for Food Research Data Collection

An,

Oh,

Kim

et al. 2024

JWE

View full text Add to dashboard Cite

show abstract

“…Compared with a single computer, distributed storage (e.g., Hadoop distributed file system (HDFS), HBase) and computational technologies (e.g., MapReduce, Spark) use the storage and computational resources of clusters and show tremendous advantages when data increases dramatically. Therefore, they are extensively used in the storage [19][20][21], calculation [22,23], segmentation [24], and path planning of massive remote sensing data. Wang et al used the MapReduce-based distributed parallel Dijkstra algorithm to solve the shortest path problem.…”

Section: Introductionmentioning

confidence: 99%

A Fast Large-Scale Path Planning Method on Lunar DEM Using Distributed Tile Pyramid Strategy

Hong

Tong

et al. 2023

IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing

View full text Add to dashboard Cite

In lunar exploration missions, path planning for lunar rovers using digital elevation models (DEMs) is currently a hot topic in academic research. However, research on path planning using large-scale DEMs has rarely been discussed, owing to the low time efficiency of existing algorithms. Therefore, in this study, we propose a fast path-planning method using a distributed tile pyramid strategy and an improved A* algorithm. The proposed method consists of three main steps. First, the tile pyramid is generated for the large lunar DEM and stored in Hadoop distributed file system. Second, a distributed path-planning strategy based on tile pyramid (DPPS-TP) is used to accelerate path-planning tasks on large-scale lunar DEMs using Spark and Hadoop. Finally, an improved A* algorithm was proposed to improve the speed of the pathplanning task in each tile. The method was tested using lunar DEM images. Experimental results demonstrate that: (1) in a single-machine serial strategy using source DEM generated by the Chang'e-2 CCD stereo camera, the proposed A* algorithm for Open List and Closed List with random access feature (OC-RA-A* algorithm) is 3.59 times faster than the traditional A* algorithm in long-distance path planning tasks; (2) compared to the distributed parallel computation strategy using source DEM generated by the Chang'e-2 CCD stereo camera, the proposed DPPS-TP based on tile pyramid DEM is 113.66 times faster in the long-range path planning task.

show abstract

“…Thus, a hybrid DFS, which exploits a bunch of SSDs to work as a small cluster (i.e., the SSD cluster in our particular sense) to facilitate the storage system as a whole is more practical in reality 7,8 . For example, the SSD cluster is in practice often used to optimise the storage of small data 9,10 , metadata 11,12 or functioning as caches for hot data 13 under common scenarios such as Internet of Things (IoT) 14,15 .…”

Section: Introductionmentioning

confidence: 99%

How does solid‐state drives cluster perform for distributed file systems: An empirical study

Wang

et al. 2023

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

As the capacity of Solid-State Drives (SSDs) is constantly being optimised and boosted with gradually reduced cost, the SSD cluster is now widely deployed as part of the hybrid storage system in various scenarios such as cloud computing and big data processing. However, despite its rapid developments, the performance of the SSD cluster remains largely under-investigated, leaving its sub-optimal applications in reality. To address this issue, in this paper we conduct extensive empirical studies for a comprehensive understanding of the SSD cluster in diverse settings. To this end, we configure a real SSD cluster and gather the generated trace data based on some often-used benchmarks, then adopt analytical methods to analyse the performance of the SSD cluster with different configurations. In particular, regression models are built to provide better performance predictability under broader configurations, and the correlations between influential factors and performance metrics with respect to different numbers of nodes are investigated, which reveal the high scalability of the SSD cluster. Additionally, the cluster's network bandwidth is inspected to explain the performance bottleneck. Finally, the knowledge gained is summarised to benefit the SSD cluster deployment in practice.

show abstract

MIX-RS: A Multi-Indexing System Based on HDFS for Remote Sensing Data Storage

Cited by 13 publications

References 34 publications

Data Lake Conceptualized Web Platform for Food Research Data Collection

Data Lake Conceptualized Web Platform for Food Research Data Collection

A Fast Large-Scale Path Planning Method on Lunar DEM Using Distributed Tile Pyramid Strategy

How does solid‐state drives cluster perform for distributed file systems: An empirical study

Contact Info

Product

Resources

About