This paper proposes an approach, improving the quality of the original educational course programmes semantic search algorithm, based on vector representations, produced by distributional semantic. The proposed approach works by providing an expert with interpretable topic filtering of courses in search results. Application of probabilistic topic modeling based on additive regularization ensures the interpretability of vector components in representations of texts, allowing the expert, in the process of exploratory search, to narrow down the set of relevant documents found previously by using the vector model. In our experiments, we study the applied task of educational course search, using current requirements of the labor market (requirements described in professional standards serve as search queries). The implementation of topic filtering is based on the open-source library BigARTM. We investigate the influence of hyperparameters and the choice of regularizers in the construction of a topic model on the improvement of quality of educational course semantic search using various vector models: word2vec, fasttext, TF-IDF are investigated.
In this article we discuss the approach to information extraction (IE) using neural language models. We provide a detailed overview of modern IE methods: both supervised and unsupervised. The proposed method allows to achieve a high quality solution to the problem of analyzing the relevant labor market requirements without the need for a time-consuming labelling procedure. In this experiment, professional standards act as a knowledge base of the labor domain. Comparing the descriptions of work actions and requirements from professional standards with the elements of job listings, we extract four entity types. The approach is based on the classification of vector representations of texts, generated using various neural language models: averaged word2vec, SIF-weighted averaged word2vec, TF-IDF-weighted averaged word2vec, paragraph2vec. Experimentally, the best quality was shown by the averaged word2vec (CBOW) model.
In this paper, we review most popular approaches to a variety of natural language processing (NLP) tasks, primarily those, which involve machine learning: from classics to state-of-the-art technologies. Most modern approaches can be separated into three rough categories: ones based on distributional hypothesis, those extracting information from graph-like structures (such as ontologies) and the ones that look for lexico-syntactic patterns in text documents. We focus mainly on the former of the three. Before the analysis can even begin, one of the important steps in preparation stage of NLP is the task of representing words and documents as numeric vectors. There exists a variety of approaches from the most simplistic Bag-of-Words to sophisticated machine learning methods, such as word embedding. Today, in the task of information retrieval the best quality for both English and Russian languages is achieved by approaches based on word embedding algorithms, trained on carefully picked text corpora in conjunction with deep syntactic and semantic analysis using various deep neural networks. A big variety of different machine learning algorithms is being applied for NLP tasks such as Part-of-Speech-tagging, text summarization, named entity recognition, document classification, topic and relation extraction and natural language question answering. We also review possibilities of applying these approaches and methods to educational content analysis, and propose the novel approach to utilizing NLP and machine learning capabilities in analyzing and synthesizing educational content in a form of a decision support systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.