Purpose
This paper aims to develop a system, which would enable efficient management and exploitation of documentation in electronic form, related to mining projects, with information retrieval and information extraction (IE) features, using various language resources and natural language processing.
Design/methodology/approach
The system is designed to integrate textual, lexical, semantic and terminological resources, enabling advanced document search and extraction of information. These resources are integrated with a set of Web services and applications, for different user profiles and use-cases.
Findings
The use of the system is illustrated by examples demonstrating keyword search supported by Web query expansion services, search based on regular expressions, corpus search based on local grammars, followed by extraction of information based on this search and finally, search with lexical masks using domain and semantic markers.
Originality/value
The presented system is the first software solution for implementation of human language technology in management of documentation from the mining engineering domain, but it is also applicable to other engineering and non-engineering domains. The system is independent of the type of alphabet (Cyrillic and Latin), which makes it applicable to other languages of the Balkan region related to Serbian, and its support for morphological dictionaries can be applied in most morphologically complex languages, such as Slavic languages. Significant search improvements and the efficiency of IE are based on semantic networks and terminology dictionaries, with the support of local grammars.
Considering the linguistic means for expressing the concept of a border in
the Serbian language, Professor Predrag Piper listed several typical nouns
that represent lexical means in that categorial-semantic complex. This paper
investigates other nouns that serve this function in the Serbian language.
The starting point is the assumption that the role of a boundary is probably
performed by those nouns in the Serbian language that are more often than
others used with prepositions, that is to say, preceded by a preposition in
most cases. A list of all such nouns was made using the electronic corpus
SrpKor, followed by excerpting only the nouns suggesting some type of a
borderline which are most often used directly after the following
prepositions: do, od, iz, na, oko, pred, u. Subsequently, a further
selection was made based on semantic and pragmatic criteria, and thus the final list of delimiting nouns described in this paper was obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.