Cyber-aggression, cyberbullying, and cyber-grooming are distinctive and similar phenomena that represent the objectionable content appearing on online social media. Timely detection of the objectionable content is very important for its prevention and reduction. This article explores and spotlights diversity of definitions of cyber-aggression, cyberbulling, and cyber-grooming; analyzes current categorization systems and taxonomies; identifies the targets, target categories, and subcategories of the subjects of the objectionable content research; analyzes the ambiguity of the linguistic terms in the domain; reviews present databases gathered for researching the field; explores types of features used for modeling systems for automatic detection; and examines methods for automatic detection and/or prediction of the objectionable content. The results point to directions of system development for tracing transformations of objectionable content over time on different online social platforms.
Purpose
A hybrid approach is presented, which combines linguistic and statistical information to semi-automatically extract multiword term candidates from texts.
Design/methodology/approach
The method is designed to be domain and language independent, focusing on languages with rich morphology. Here, it is used for extracting multiword terms from texts in Serbian, belonging to the agricultural engineering domain, as a use case. Predefined syntactic structures were used for multiword terms. For each structure, a finite state transducer was developed, which recognizes text sequences having that structure and outputs the sequence in a normalized form, so that different inflectional forms of the same multiword term can be counted properly. Term candidates were further filtered by their frequencies and evaluated by two domain experts.
Findings
By using language resources, such as electronic dictionaries and grammars, 928 multiword terms were extracted out of 1,523 multiword terms that were recognized as candidates from a corpus having 42,260 different simple word forms; 870 of these were new, not already contained in the existing electronic dictionary of compounds for Serbian, and they were used to enrich the dictionary.
Originality/value
The paper presents methodology that can significantly contribute to the development of terminology lexicons in different areas. In this particular use case, some important agricultural engineering concepts were extracted from the text, but this approach could be used for other domains and languages as well.
This article presents two methods for the automatic generation of application ontologies from the multilingual BalkaNet WordNets Web ontology language (OWL) representation. Both proposed methods are applied on the BalkaNet WordNets ontology for the Serbian language (SerWN). The first one uses only the SerWN, both for generating class hierarchy and instances of classes, while the other method combines the SerWN with a domain ontology. The first method was used to automatically generate the FoodOntology, whereas the second method to generate the ontology of rhetorical figures tropes. Preliminary evaluation results corroborate the soundness of the approach. Since BN consists of individual WNs for five Balkan languages and Czech, the methodology presented in this article can also be used for all these languages. The first method can also be used for other domains.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.