Classification of heterogeneous text data for robust domain-specific language modeling

Staš, Ján; Juhár, Jozef; Hládek, Daniel

doi:10.1186/1687-4722-2014-14

Cited by 12 publications

(9 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When the number of documents increased, the computational complexity also increased (Stas, Juhar, & Hladek, 2014). ML is often seen as an offshoot of statistics as far as data mining is concerned.…”

Section: Literature Reviewmentioning

confidence: 99%

Text Classification Techniques: A Literature Review

Thangaraj

Sivakami

2018

IJIKM

View full text Add to dashboard Cite

Aim/Purpose: The aim of this paper is to analyze various text classification techniques employed in practice, their strengths and weaknesses, to provide an improved awareness regarding various knowledge extraction possibilities in the field of data mining. Background: Artificial Intelligence is reshaping text classification techniques to better acquire knowledge. However, in spite of the growth and spread of AI in all fields of research, its role with respect to text mining is not well understood yet. Methodology: For this study, various articles written between 2010 and 2017 on “text classification techniques in AI”, selected from leading journals of computer science, were analyzed. Each article was completely read. The research problems related to text classification techniques in the field of AI were identified and techniques were grouped according to the algorithms involved. These algorithms were divided based on the learning procedure used. Finally, the findings were plotted as a tree structure for visualizing the relationship between learning procedures and algorithms. Contribution: This paper identifies the strengths, limitations, and current research trends in text classification in an advanced field like AI. This knowledge is crucial for data scientists. They could utilize the findings of this study to devise customized data models. It also helps the industry to understand the operational efficiency of text mining techniques. It further contributes to reducing the cost of the projects and supports effective decision making. Findings: It has been found more important to study and understand the nature of data before proceeding into mining. The automation of text classification process is required, with the increasing amount of data and need for accuracy. Another interesting research opportunity lies in building intricate text data models with deep learning systems. It has the ability to execute complex Natural Language Processing (NLP) tasks with semantic requirements. Recommendations for Practitioners: Frame analysis, deception detection, narrative science where data expresses a story, healthcare applications to diagnose illnesses and conversation analysis are some of the recommendations suggested for practitioners. Recommendation for Researchers: Developing simpler algorithms in terms of coding and implementation, better approaches for knowledge distillation, multilingual text refining, domain knowledge integration, subjectivity detection, and contrastive viewpoint summarization are some of the areas that could be explored by researchers. Impact on Society: Text classification forms the base of data analytics and acts as the engine behind knowledge discovery. It supports state-of-the-art decision making, for example, predicting an event before it actually occurs, classifying a transaction as ‘Fraudulent’ etc. The results of this study could be used for developing applications dedicated to assisting decision making processes. These informed decisions will help to optimize resources and maximize benefits to the mankind. Future Research: In the future, better methods for parameter optimization will be identified by selecting better parameters that reflects effective knowledge discovery. The role of streaming data processing is still rarely explored when it comes to text classification.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Text Classification Techniques: A Literature Review

Thangaraj

Sivakami

2018

IJIKM

View full text Add to dashboard Cite

show abstract

“…Text classification is a well-studied area in Natural Language Processing, yet it still is a very demanding research subject [50][51][52]. Most of the text classification methods concentrate on the context classification.…”

Section: Methodsmentioning

confidence: 99%

Automatic Kurdish Dialects Identification

Hassani

Medjedovic

2016

Computer Science &Amp; Information Technology ( CS &Amp; IT )

View full text Add to dashboard Cite

Automatic dialect identification is a necessary Language Technology for processing multidialect languages in which the dialects are linguistically far from each other. Particularly, this becomes crucial where the dialects are mutually unintelligible. Therefore, to perform computational activities on these languages, the system needs to identify the dialect that is the subject of the process. Kurdish language encompasses various dialects. It is written using several different scripts. The language lacks of a standard orthography. This situation makes the Kurdish dialectal identification more interesting and required, both form the research and from the application perspectives. In this research, we have applied a classification method, based on supervised machine learning, to identify the dialects of the Kurdish texts. The research has focused on two widely spoken and most dominant Kurdish dialects, namely, Kurmanji and Sorani. The approach could be applied to the other Kurdish dialects as well. The method is also applicable to the languages which are similar to Kurdish in their dialectal diversity and differences.

show abstract

“…To calculate the numeric value of the features in sentence S k , Eqs. (14) and (15) are introduced, where NwS k is the number of words in S k . (9) In Fig.…”

Section: Capturing Domain Sensitive Features (Dsf)mentioning

confidence: 99%

“…Meanwhile, ML-based techniques rely on ML algorithms and see SA as a regular text classification task. Text classification task assigns a piece of text data into several predefined classes involving ML algorithms [15]. In terms of SA task, ML-based techniques classify text document into one out of three classes namely positive class, neutral class, and negative class.…”

mentioning

confidence: 99%

Evaluating the performance of sentence level features and domain sensitive features of product reviews on supervised sentiment analysis tasks

2019

View full text Add to dashboard Cite

The exponential growth of e-commerce has triggered it to become a rich source of information nowadays. On e-commerce, customers provide a qualitative evaluation in the form of an online review that describes their opinions on a specific product [1]. With a huge number of OPRs, manual processing is not an efficient task. Sentiment analysis (SA) technique emerges in response to the requirement of processing OPRs in speed [2]. In terms of product review analysis, SA which is also named Opinion Mining can be defined as a task of recognizing customer's opinion or sentiment toward the products or the product features [3] that can be categorized into positive, negative, or neutral

show abstract

Classification of heterogeneous text data for robust domain-specific language modeling

Cited by 12 publications

References 16 publications

Text Classification Techniques: A Literature Review

Text Classification Techniques: A Literature Review

Automatic Kurdish Dialects Identification

Evaluating the performance of sentence level features and domain sensitive features of product reviews on supervised sentiment analysis tasks

Contact Info

Product

Resources

About