Exploiting FastDFS Client-based Small File Merging

Chen, Haimeng; Zhang, Hua

doi:10.2991/aiea-16.2016.40

Cited by 4 publications

(3 citation statements)

References 6 publications

(5 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the use of the divide-and-conquer strategy in distributed computing, a big data file is partitioned into a number of small files, called data blocks, which are stored in a distributed manner on the disks of cluster nodes to improve I/O performance. A big data file stored in this way is called a distributed data file, which is managed on the cluster with a distributed file system [32,33] , e.g., GFS [8] , HDFS [34] , Taobao file system (TFS) [35,36] , and FastDFS [37] . The distributed file systems provide an important technical foundation for big data analysis [38] .…”

Section: Distributed File Systemsmentioning

confidence: 99%

“…TFS is a high-availability, highperformance distributed file system developed by Taobao to meet the storage requirements of unstructured small files (usually no more than 1 MB). FastDFS is a lightweight open-source distributed file system that is especially suitable for online services using files as the carrier [37] . HDFS, which was developed in the Apache Hadoop project, was designed to overcome the challenges of distributed data processing in a large-scale cluster.…”

Section: Distributed File Systemsmentioning

confidence: 99%

See 1 more Smart Citation

Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

Sun

et al. 2023

Big Data Min. Anal.

View full text Add to dashboard Cite

Distributed computing frameworks are the fundamental component of distributed computing systems.They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes.In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.

show abstract

Section: Distributed File Systemsmentioning

confidence: 99%

Section: Distributed File Systemsmentioning

confidence: 99%

Survey of Distributed Computing Frameworks for Supporting Big Data Analysis

Sun

et al. 2023

Big Data Min. Anal.

View full text Add to dashboard Cite

show abstract

“…Finally, the consensus mechanism is used to realize the consensus of all nodes in the network, and legal blocks are joined in the blockchain so that the information on transactions cannot be tampered with. [25] is an open-source lightweight distributed file system developed by Using C language, which can work well on UNIX-like systems and pursue high performance and high scalability.…”

Section: Related Work and Backgroundmentioning

confidence: 99%

Controlled Sharing Mechanism of Data Based on the Consortium Blockchain

Yang

et al. 2021

Security and Communication Networks

View full text Add to dashboard Cite

In the process of sharing data, the costless replication of electric energy data leads to the problem of uncontrolled data and the difficulty of third-party access verification. This paper proposes a controlled sharing mechanism of data based on the consortium blockchain. The data flow range is controlled by the data isolation mechanism between channels provided by the consortium blockchain by constructing a data storage consortium chain to achieve trusted data storage, combining attribute-based encryption to complete data access control and meet the demands for granular data accessibility control and secure sharing; the data flow transfer ledger is built to record the original data life cycle management and effectively record the data transfer process of each data controller. Taking the application scenario of electric energy data sharing as an example, the scheme is designed and simulated on the Linux system and Hyperledger Fabric. Experimental results have verified that the mechanism can effectively control the scope of access to electrical energy data and realize the control of the data by the data owner.

show abstract