A lot of research has been devoted to identity documents analysis and recognition on mobile devices. However, no publicly available datasets designed for this particular problem currently exist. There are a few datasets which are useful for associated subtasks but in order to facilitate a more comprehensive scientific and technical approach to identity documents recognition more specialized datasets are required. In this paper we present a Mobile Identity Document Video dataset (MIDV-500) consisting of 500 video clips for 50 different identity document types with ground truth which allows to perform research in a wide This work is partially supported by Russian Foundation for Basic Research (projects 17-29-03170 and 17-29-03370). Source images for MIDV-500 datasets are obtained from Wikimedia Commons
Recognition of identity documents using mobile devices has become a topic of a wide range of computer vision research. The portfolio of methods and algorithms for solving such tasks as face detection, document detection and rectification, text field recognition, and other, is growing, and the scarcity of datasets has become an important issue. One of the openly accessible datasets for evaluating such methods is MIDV-500, containing video clips of 50 identity document types in various conditions. However, the variability of capturing conditions in MIDV-500 did not address some of the key issues, mainly significant projective distortions and different lighting conditions. In this paper we present a MIDV-2019 dataset, containing video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions. The description of the added data is presented, and experimental baselines for text field recognition in different conditions.
The paper describes the problem of stopping the text field recognition process in a video stream, which is a novel problem, particularly relevant to real-time mobile document recognition systems. A decision-theoretic framework for this problem is provided, and similarities with existing stopping rule problems are explored. Following the theoretical works on monotone stopping rule problems, a strategy is proposed based on thresholding the estimation of the expected difference between consequent recognition results. The efficiency of this strategy is evaluated on an openly accessible dataset. The results show that this method outperforms the previously published methods based on identical results cluster size thresholding. Notes on future work include incorporation of recognition result confidence estimations in the proposed model and more precise evaluation of the observation cost. Keywords Recognition in video stream • Mobile OCR • Stopping rules • Decision making • Mobile document recognition • Anytime algorithms This work was partially financially supported by Russian Foundation for Basic Research, Projects 17-29-03170 and 17-29-03370.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.