Pool of knowledge available to the mankind depends on the source of learning resources, which can vary from ancient printed documents to present electronic material. The rapid conversion of material available in traditional libraries to digital form needs a significant amount of work if we are to maintain the format and the look of the electronic documents as same as their printed counterparts. Most of the printed documents contain not only characters and its formatting but also some associated non text objects such as tables, charts and graphical objects. It is challenging to detect them and to concentrate on the format preservation of the contents while reproducing them. To address this issue, we propose an algorithm using local thresholds for word space and line height to locate and extract all categories of tables from scanned document images. From the experiments performed on 298 documents, we conclude that our algorithm has an overall accuracy of about 75% in detecting tables from the scanned document images. Since the algorithm does not completely depend on rule lines, it can detect all categories of tables in a range of scanned documents with different font types, styles and sizes to extract their formatting features. Moreover, the algorithm can be applied to locate tables in multi column layouts with small modification in layout analysis. Treating tables with their existing formatting features will tremendously help the reproducing of printed documents for reprinting and updating purposes.
-Plagiarism is one of the growing issues in academia and is always a concern in Universities and other academic institutions. The situation is becoming even worse with the availability of ample resources on the web. This paper focuses on creating an effective and fast tool for plagiarism detection for text based electronic assignments. Our plagiarism detection tool named AntiPlag is developed using the tri-gram sequence matching technique. Three sets of text based assignments were tested by AntiPlag and the results were compared against an existing commercial plagiarism detection tool. AntiPlag showed better results in terms of false positives compared to the commercial tool due to the pre-processing steps performed in AntiPlag. In addition, to improve the detection latency, AntiPlag applies a data clustering technique making it four times faster than the commercial tool considered. AntiPlag could be used to isolate plagiarized text based assignments from non-plagiarised assignments easily. Therefore, we present AntiPlag, a fast and effective tool for plagiarism detection on text based electronic assignments.
Plagiarism is known as illegal use of others' part of work or whole work as one's own in any field such as art, poetry, literature, cinema, research and other creative forms of study. Plagiarism is one of the important issues in academic and research fields and giving more concern in academic systems. The situation is even worse with the availability of ample resources on the web. This paper focuses on an effective plagiarism detection tool on identifying suitable intra-corpal plagiarism detection for text based assignments by comparing unigram, bigram, trigram of vector space model with cosine similarity measure. Manually evaluated, labelled dataset was tested using unigram, bigram and trigram vector. Even though trigram vector consumes comparatively more time, it shows better results with the labelled data. In addition, the selected trigram vector space model with cosine similarity measure is compared with tri-gram sequence matching technique with Jaccard measure. In the results, cosine similarity score shows slightly higher values than the other. Because, it focuses on giving more weight for terms that do not frequently exist in the dataset and cosine similarity measure using trigram technique is more preferable than the other. Therefore, we present our new tool and it could be used as an effective tool to evaluate text based electronic assignments and minimize the plagiarism among students.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.