Irfan Raza Naqvi scite author profile

The rapid growth of electronic documents are causing problems like unstructured data that need more time and effort to search a relevant document. Text Document Classification (TDC) has a great significance in information processing and retrieval where unstructured documents are organized into predefined classes. Urdu is the most favorite research language in South Asian languages because of its complex morphology, unique features, and lack of linguistic resources like standard datasets. As compared to short text, like sentiment analysis, long text classification needs more time and effort because of large vocabulary, more noise, and redundant information. Machine Learning (ML) and Deep Learning (DL) models have been widely used in text processing. Despite the major limitations of ML models, like learn directed features, these are the favorite methods for Urdu TDC. To the best of our knowledge, it is the first study of Urdu TDC using DL model. In this paper, we design a large multipurpose and multi-format dataset that contain more than ten thousand documents organize into six classes. We use Single-layer Multisize Filters Convolutional Neural Network (SMFCNN) for classification and compare its performance with sixteen ML baseline models on three imbalanced datasets of various sizes. Further, we analyze the effects of preprocessing methods on SMFCNN performance. SMFCNN outperformed the baseline classifiers and achieved 95.4%, 91.8%, and 93.3% scores of accuracy on medium, large and small size dataset respectively. The designed dataset would be publically and freely available in different formats for future research in Urdu text processing. INDEX TERMS Convolutional neural network, deep learning, machine learning, natural language processing, text document classification, Urdu text classification.

show abstract

Exploring deep learning approaches for Urdu text classification in product manufacturing

Akhter

Zheng

Naqvi

et al. 2020

Enterprise Information Systems

View full text Add to dashboard Cite

Automatic Detection of Offensive Language for Urdu and Roman Urdu

et al. 2020

View full text Add to dashboard Cite

In recent years, unethical behavior in the cyber-environment has been revealed. The presence of offensive language on social media platforms and automatic detection of such language is becoming a major challenge in modern society. The complexity of natural language constructs makes this task even more challenging. Until now, most of the research has focused on resource-rich languages like English. Roman Urdu and Urdu are two scripts of writing the Urdu language on social media. The Roman script uses the English language characters while the Urdu script uses Urdu language characters. Urdu and Hindi languages are similar with the only difference in their writing script but the Roman scripts of both languages are similar. This study is about the detection of offensive language from the user's comments presented in a resourcepoor language Urdu. We propose the first offensive dataset of Urdu containing user-generated comments from social media. We use individual and combined n-grams techniques to extract features at character-level and word-level. We apply seventeen classifiers from seven machine learning techniques to detect offensive language from both Urdu and Roman Urdu text comments. Experiments show that the regression-based models using character n-grams show superior performance to process the Urdu language. Character-level tri-gram outperforms the other word and character n-grams. LogitBoost and SimpleLogistic outperform the other models and achieve 99.2% and 95.9% values of F-measure on Roman Urdu and Urdu datasets respectively. Our designed dataset is publically available on GitHub for future research.

show abstract

Abusive language detection from social media comments using conventional machine learning and deep learning approaches

et al. 2021

View full text Add to dashboard Cite

Correction to: Abusive language detection from social media comments using conventional machine learning and deep learning approaches

et al. 2021

View full text Add to dashboard Cite

Multi‐dimensional weighted cross‐attention network in crowded scenes

Xie

Zheng

Xuan

et al. 2021

IET Image Processing

View full text Add to dashboard Cite

Human detection in crowded scenes is one of the research components of crowd safety problem analysis, such as emergency warning and security monitoring platforms. Although the existing anchor-free methods have fast inference speed, they are not suitable for object detection in crowded scenes due to the model's inability to predict the well-fined object detection bounding boxes. This work proposes an end-to-end anchor-free network, Multidimensional Weighted Cross-Attention Network (MANet), which can perform real-time human detection in crowded scenes. Specifically, the Double-flow Weighted Feature Cascade Module (DW-FCM) is used in the extractor to highlight the contribution of features at different levels. The Triplet Cross Attention Module (TCAM) is used in the detector head to enhance the association dependence of multi-dimension features, further strengthening human boundary features' discrimination ability at a fine-grained level. Moreover, the strategy of Adaptively Opposite Thrust Mapping (AOTM) ground-truth annotation is proposed to achieve bias correction of erroneous mappings and reduce the iterations of useless learning of the network. These strategies effectively alleviate the defect that the existing anchor-free network cannot correctly distinguish and locate the individual human in crowded scenes. Compared with the anchor-based detection method, there is no need to set anchor parameters manually, and the detection speed can satisfy the real-time application. Finally, through extensive comparative experiments on CrowdHuman and WIDER FACE datasets, the results demonstrate that the improved strategy achieves the state-ofthe-art result in the anchor-free methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Irfan Raza Naqvi

Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network

Exploring deep learning approaches for Urdu text classification in product manufacturing

Automatic Detection of Offensive Language for Urdu and Roman Urdu

Abusive language detection from social media comments using conventional machine learning and deep learning approaches

Correction to: Abusive language detection from social media comments using conventional machine learning and deep learning approaches

Multi‐dimensional weighted cross‐attention network in crowded scenes

Contact Info

Product

Resources

About