Automatic reconstruction of cross-cut shredded text documents (RCCSTD) is important in some areas and it is still a highly challenging problem so far. In this work, we propose a novel semi-automatic reconstruction solution archive for RCCSTD. This solution archive consists of five components, namely preprocessing, row clustering, error evaluation function (EEF), optimal reconstructing route searching and human mediation (HM). Specifically, a row clustering algorithm based on signal correlation coefficient and cross-correlation sequence, and an improved EEF based on gradient vector is separately evaluated by combining with HM and without HM. Experimental results show that row clustering is effective for identifying and grouping shreds belonging to a same row of text documents. The EEF proposed in this work improves the precision and produces high performance in RCCSTD regardless of using HM or not. Overall, extra HM boosts both of the performance of row clustering and shred reconstructing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.