Big Data Retrieval using HDFS with LZO Compression

Thangavel, Prasanth; Aarthi, K.; Gunasekaran, M.

doi:10.1109/icacce46606.2019.9079993

Cited by 4 publications

(5 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After U 11 and V T 11 are recovered from U 11 and V T 11 , the approximate data matrix A can be reconstructed by replacing S with S , as shown in (6).…”

Section: Icipost-2022mentioning

confidence: 99%

“…Lossless compression can recover data without loss of accuracy, it is implemented by adjusting the encoding at the binary level [5]. However, lossless compression sacrifices compression performance for accuracy, which is not very well for big data [6]. Within the accuracy requirement, lossy compression can achieve a better result compared with lossless compression [7].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Ocean data compression based on block SVD

Wang

Zhou

Zhang

2023

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

Fast development of ocean observations and numerical modeling increases the need for data transmission, storage and extraction. This paper presented a new data compression method based on Singular Value Decomposition (SVD) with data matrix divided into different sub-matrices with consideration of odevity and remnant. An automatic matrix-dividing method is applied to divide smartly the data matrix into sub-matrices. These sub-matrices are then compressed based on an improved SVD, which enhances the compression performance by utilizing the orthogonal property of vectors generated by SVD. A dynamic optimization method which is capable of determining the proper scale of retained data under the accuracy requirement of ocean data is also established. Two indices are derived mathematically to search the best block pattern quickly. The performance and reliability of the block-based SVD compression is verified with the successful compression and recovery of the Hybrid Coordinate Ocean Model data.

show abstract

“…After U 11 and V T 11 are recovered from U 11 and V T 11 , the approximate data matrix A can be reconstructed by replacing S with S , as shown in (6).…”

Section: Icipost-2022mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Ocean data compression based on block SVD

Wang

Zhou

Zhang

2023

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

show abstract

“…Hbase supports four different compression algorithms, which can be directly applied on the ColumnFamily. That includes SNAPPY [18], LZO [19], LZ4 [20] and GZ [21] compressions. When creating a table every ColumnFamily is defined separately meaning that some families can have a compression algorithm applied to them and some may not.…”

Section: Column-oriented Data Model Propertiesmentioning

confidence: 99%

Impact of Data Compression on the Performance of Column-oriented Data Stores

Mladenova¹,

Kalmukov²,

Marinov³

et al. 2021

IJACSA

View full text Add to dashboard Cite

Compression of data in traditional relational database management systems significantly improves the system performance by decreasing the size of the data that results in less data transfer time within the communication environment and higher efficiency in I/O operations. The column-oriented database management systems should perform even better since each attribute is stored in a separate column, so that its sequential values are stored and accessed sequentially on the disk. That further increases the compression efficiency as the entire column is compressed/decompressed at once. The aim of this research is to determine if data compression could improve the performance of HBase, running on a small-sized Hadoop cluster, consisted of one name node and nine data nodes. Test scenario includes performing Insert and Select queries on multiple records with and without data compression. Four data compression algorithms are tested since they are natively supported by HBase -SNAPPY, LZO, LZ4 and GZ. Results show that data compression in HBase highly improves system performance in terms of storage saving. It shrinks data 5 to 10 times (depending on the algorithm) without any noticeable additional CPU load. That allows smaller but significantly faster SSD disks to be used as cluster's primary data storage. Furthermore, the substantial decrease in the network traffic is an additional benefit with major impact on big data processing.

show abstract

“…In addition, we adopt LZO compression to reduce the disk I/O for I/O-intensive tasks. 36 The specific LZO compression principle can be found in Reference 36. The designed algorithm is shown in Algorithm 2.…”

Section: Smosa Task Scheduling Algorithmmentioning

confidence: 99%

“…SMOSA not only optimizes the task scheduling process, but also compresses I/O task data. As the Teragen needs to generate a large amount of disk storage data, the use of LZO compression technology can improve the internal data Shuffle process, 36 and a Shuffle process is the most resource-consuming link in task execution. Therefore, when a large amount of data needs to be generated, the data compression method can reduce the time of data transmission and the disk read time, which effectively shortens the execution time of the task.…”

Section: I/o-intensive Tasksmentioning

confidence: 99%

SMOSA: Spider monkey optimization‐based scheduling algorithm for heterogeneous Hadoop

Zhang

Guan

et al. 2021

Concurrency and Computation

View full text Add to dashboard Cite

Hadoop is a typical framework for processing big data. Task scheduling algorithms have a significant impact on the processing performance of Hadoop clusters. Existing scheduling algorithms of Hadoop fail to consider the performance differences between nodes in heterogeneous Hadoop clusters, causing problems such as uneven task allocation and low resource utilization. Aiming to solve this problem, we propose a spider monkey optimization-based scheduling algorithm (SMOSA) for heterogeneous Hadoop. First, the cluster heartbeat mechanism is used to obtain information such as memories and CPUs of nodes to comprehensively consider the actual load capacity of each node. Then, the spider monkey optimization algorithm is adopted to find the optimal mapping relationship between tasks and resources by taking the task completion time as the objective function and updating the position of the spider monkey.Finally, we calculate the remaining rate of node hardware resources, and according to the task type, the node with the higher remaining rate of resource will give priority to the task. Data are compressed for I/O type tasks to reduce disk operations and increase the speed of task execution. Experimental results show that, compared with existing scheduling algorithms, the SMOSA can effectively reduce task execution time and can significantly improve scheduling efficiency and task execution speed especially in heterogeneous Hadoop clusters. For different types of tasks, the execution time can be reduced by up to 19%.

show abstract

Big Data Retrieval using HDFS with LZO Compression

Cited by 4 publications

References 21 publications

Ocean data compression based on block SVD

Ocean data compression based on block SVD

Impact of Data Compression on the Performance of Column-oriented Data Stores

SMOSA: Spider monkey optimization‐based scheduling algorithm for heterogeneous Hadoop

Contact Info

Product

Resources

About