Houssem Menhour scite author profile

Regardless of industry, the overload of information facing most organizations today is a drain on both individuals and the enterprise itself. The increasing volume of this information, which is stored in different electronic formats, requires new sophisticated systems to analyse and classify them. In this paper, we attempt to implement a framework Document Classification and Analysis (DoCA) that can simplify and automate such tasks for different file types, namely: office documents (text, spreadsheets, and presentations), scanned documents (images and PDFs), multimedia files (video and audio). Each file type requires different methods for pre-processing, analysis, and classification. The efficiency and feasibility of the DoCA are examined on HAVELSAN dataset and accuracy of different tasks shows that the DoCA is a promising tool for document analysis and classification.INDEX TERMS Document analysis, document classification, OCR, video-audio analysis.

show abstract

Searchable Turkish OCRed historical newspaper collection 1928–1942

Menhour

Şahin

Sarıkaya

et al. 2021

Journal of Information Science

View full text Add to dashboard Cite

The newspaper emerged as a distinct cultural form in early 17th-century Europe. It is bound up with the early modern period of history. Historical newspapers are of utmost importance to nations and its people, and researchers from different disciplines rely on these papers to improve our understanding of the past. In pursuit of satisfying this need, Istanbul University Head Office of Library and Documentation provides access to a big database of scanned historical newspapers. To take it another step further and make the documents more accessible, we need to run optical character recognition (OCR) and named entity recognition (NER) tasks on the whole database and index the results to allow for full-text search mechanism. We design and implement a system encompassing the whole pipeline starting from scrapping the dataset from the original website to providing a graphical user interface to run search queries, and it manages to do that successfully. Proposed system provides to search people, culture and security-related keywords and to visualise them.

show abstract

A reproducible educational plan to teach mini autonomous race car programming

Eken

Şara

Satılmış

et al. 2020

The International Journal of Electrical Engineering & Educa

View full text Add to dashboard Cite

As autonomous cars and complex features of them grow in popularity, ensuring that analyses and capabilities are reproducible and repeatable has taken on importance in education plans too. This paper describes a reproducible research plan on mini autonomous race car programming. This educational plan is designed and implemented as part of a summer internship program at Kocaeli University and it consists of theoretical courses and laboratory assignments. A literate programming approach with the Python language is used for programming the race car. To assess the educational program’s impact on the learning process and to evaluate the acceptance and satisfaction level of students, they answered an electronic questionnaire after finishing the program. According to students’ feedback, the reproducible educational program is useful for learning and consolidating new concepts of mini autonomous car programming.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Houssem Menhour

DoCA: A Content-Based Automatic Classification System Over Digital Documents

Searchable Turkish OCRed historical newspaper collection 1928–1942

A reproducible educational plan to teach mini autonomous race car programming

Contact Info

Product

Resources

About