Data chunking algorithms divide data into several small data chunks in a certain way, thus transforming the operation of data into the one of multiple small data chunks. Data chunking algorithms have been widely used in duplicate data detection, parallel computing and other fields, but it is seldom used in data incremental synchronization. Aiming at the characteristics of incremental data synchronization, this paper proposes a novel data chunking algorithm. By dividing two data that need synchronization into small data chunks, comparing the contents of these small data chunks, different ones are the incremental data that need to be found. The new algorithm determines to set a cut-point based on the number of 1 contained in the binary format of all bytes in an interval. Thus it improves the resistance against the byte shifting problem at the expense of the chunk size stability, which makes it more suitable for the incremental data synchronization. Comparing this algorithm with several known classical or state of art algorithms, experiments show that the incremental data found by this algorithm can be reduced by 32%∼57% compared to the others with same changes between two data. The experimental results based on real-world datasets show that PCI improves the calculation speed of classic Rsync algorithm up to 70%, however, with a drawback of increasing the Transmission compression rate up to 11.8%. INDEX TERMS Data synchronization, chunking algorithm, data backup, increment.
This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.
In the data backup system, to reduce the bandwidth and processing time overhead caused by full backup technology during data synchronization between backups and source data, incremental backup technology is emerging as the focus of academic and industrial research. It is key but poorly-solved to find the incremental data between backups and source data for incremental backup technology. To find out the incremental data during the backup process, here, in this paper, we propose a novel content-defined chunking algorithm. The source data and backup data are chunked into some small chunks in the same way with the variable length. Then, by comparing whether a chunk of source data is different from any of the chunks in backup data, we can evaluate whether the chunk of source data is incremental data. By experiments, the chunking algorithm in this paper is compared to other ones which are the classical or state-of-the-art algorithms. The experimental results show that the incremental data found by this algorithm can be reduced by 13%-34% compared to the others with the same chunk throughput. INDEX TERMS Data synchronization, chunking algorithm, data backup, increment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.