“…With the use of the divide-and-conquer strategy in distributed computing, a big data file is partitioned into a number of small files, called data blocks, which are stored in a distributed manner on the disks of cluster nodes to improve I/O performance. A big data file stored in this way is called a distributed data file, which is managed on the cluster with a distributed file system [32,33] , e.g., GFS [8] , HDFS [34] , Taobao file system (TFS) [35,36] , and FastDFS [37] . The distributed file systems provide an important technical foundation for big data analysis [38] .…”