Topic classification of texts is one of the most interesting challenges in Natural Language Processing (NLP). Topic classifiers commonly use a bag-of-words approach, in which the classifier uses (and is trained with) selected terms from the input texts. In this work we present techniques based on graph similarity to classify short texts by topic. In our classifier we build graphs from the input texts, and then use properties of these graphs to classify them. We have tested the resulting algorithm by classifying Twitter messages in Spanish among a predefined set of topics, achieving more than 70% accuracy.
-Social interaction technologies (SIT) is a very broadfield that encompasses a large list of topics: interactive and networked computing, mobile social services and the Social Web, social software and social media, marketing and advertising, various aspects and uses of blogs and podcasting, corporate value and web-based collaboration, e-government and online democracy, virtual volunteering, different aspects and uses of folksonomies, tagging and the social semantic cloud of tags, blogbased knowledge management systems, systems of online learning, with their ePortfolios, blogs and wikis in education and journalism, legal issues and social interaction technology, dataveillance and online fraud, neogeography, social software usability, social software in libraries and nonprofit organizations, and broadband visual communication technology for enhancing social interaction. The fact is that the daily activities of many businesses are being socialized, as is the case with Yammer (https://www.yammer.com/), the social enterprise social network. The leitmotivs of social software are: create, connect, contribute, and collaborate.
Artificial Intelligence (AI) and its branch Natural Language Processing (NLP) in particular are main contributors to recent advances in classifying documentation and extracting information from assorted fields, Medicine being one that has gathered a lot of attention due to the amount of information generated in public professional journals and other means of communication within the medical profession. The typical information extraction task from technical texts is performed via an automatic term recognition extractor. Automatic Term Recognition (ATR) from technical texts is applied for the identification of key concepts for information retrieval and, secondarily, for machine translation. Term recognition depends on the subject domain and the lexical patterns of a given language, in our case, Spanish, Arabic and Japanese. In this article, we present the methods and techniques for creating a biomedical corpus of validated terms, with several tools for optimal exploitation of the information therewith contained in said corpus. This paper also shows how these techniques and tools have been used in a prototype.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.