Abstract. We propose an approach to the automatic categorization of text documents based on the joint application of the method of latent semantic analysis (LSA) and fuzzy inference Mamdani algorithm. Method LSA is used for the semantic analysis of information in electronic document management systems by identifying semantic relationships between terms of documents and receipt of the compliance rate of the compared vectors. The rule base is proposed for fuzzy inference algorithm of Mamdani implementing the automatic rubrication of documents for a variety of given topics enabling automated monitoring of the distribution of documents not relevant to the specified topics, or having similarities in several thematic categories on the basis of the results of latent semantic analysis. Keywords: rubrication of documents; fuzzy inference; latent semantic analysis; the rule base; a fuzzy inference Mamdani algorithm.1. Введение. Целью работы является выработка подхода к решению задачи автоматической рубрикации документов по заданным тематическим рубрикам [1, 2]. Для этого предлагается использовать совместно метод латентно-семантического анализа и алгоритм нечёткого вывода Мамдани, что определяет новизну предлагаемого подхода.Для решения задачи автоматической рубрикации документов используются методы семантического анализа и автоматического разделения поступающей информации по заданным рубрикам.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.