“…Choosing the appropriate supervised learning algorithm is essential for achieving accurate content classification and categorization. The task involves evaluating various algorithms such as support vector machines (SVM), random forests, or deep learning models like convolutional neural networks (CNNs) or transformerbased architectures (e.g., BERT, GPT) [4][5][6]. Additionally, model optimization techniques such as hyperparameter tuning, cross-validation, and regularization methods need to be applied to enhance the model's performance [16,17].…”
Section: Statement For the Taskmentioning
confidence: 99%
“…Transformer-based architectures such as BERT (Bidirectional Encoder Representations from Transformers) have gained significant attention in natural language processing tasks due to their ability to capture contextual information effectively [4][5][6]. In the context of content management, these architectures can be leveraged to enhance the accuracy and efficiency of content classification and categorization.…”
Section: Main Partmentioning
confidence: 99%
“…By utilizing unsupervised learning techniques such as clustering or size reduction, content management systems can automatically group similar documents, images, or videos together based on their inherent similarities. This can be particularly useful when dealing with large volumes of unstructured data where manually labeling each piece of content is impractical or infeasible [4][5][6].…”
The object of research is the processes of data filtering and machine learning in content management systems. The subject of research is developing a hybrid approach to data filtering based on a combination of supervised and unsupervised machine learning. The article explores machine learning approaches to content management and how they can change the way we organize, categorize, and derive value from vast amounts of data. The main goal is to develop and use a hybrid approach for data filtering and training that will help optimize resource consumption and perform supervised training for better categorization in the future. This approach includes elements of supervised and unsupervised learning using the BERT architecture that uses this kind of flow that help reduce resource usage and adjust the algorithm to perform better in a specific area. As a result, thanks to this approach, the intelligent system was able to independently optimize for a specific field of use and help to reduce the costs of using resources. Conclusion. After applying a hybrid approach of data filtering and machine learning to existing data streams, we obtain a performance increase of up to 5%, and this percentage increases depending on the running time of the application.
“…Choosing the appropriate supervised learning algorithm is essential for achieving accurate content classification and categorization. The task involves evaluating various algorithms such as support vector machines (SVM), random forests, or deep learning models like convolutional neural networks (CNNs) or transformerbased architectures (e.g., BERT, GPT) [4][5][6]. Additionally, model optimization techniques such as hyperparameter tuning, cross-validation, and regularization methods need to be applied to enhance the model's performance [16,17].…”
Section: Statement For the Taskmentioning
confidence: 99%
“…Transformer-based architectures such as BERT (Bidirectional Encoder Representations from Transformers) have gained significant attention in natural language processing tasks due to their ability to capture contextual information effectively [4][5][6]. In the context of content management, these architectures can be leveraged to enhance the accuracy and efficiency of content classification and categorization.…”
Section: Main Partmentioning
confidence: 99%
“…By utilizing unsupervised learning techniques such as clustering or size reduction, content management systems can automatically group similar documents, images, or videos together based on their inherent similarities. This can be particularly useful when dealing with large volumes of unstructured data where manually labeling each piece of content is impractical or infeasible [4][5][6].…”
The object of research is the processes of data filtering and machine learning in content management systems. The subject of research is developing a hybrid approach to data filtering based on a combination of supervised and unsupervised machine learning. The article explores machine learning approaches to content management and how they can change the way we organize, categorize, and derive value from vast amounts of data. The main goal is to develop and use a hybrid approach for data filtering and training that will help optimize resource consumption and perform supervised training for better categorization in the future. This approach includes elements of supervised and unsupervised learning using the BERT architecture that uses this kind of flow that help reduce resource usage and adjust the algorithm to perform better in a specific area. As a result, thanks to this approach, the intelligent system was able to independently optimize for a specific field of use and help to reduce the costs of using resources. Conclusion. After applying a hybrid approach of data filtering and machine learning to existing data streams, we obtain a performance increase of up to 5%, and this percentage increases depending on the running time of the application.
“…The most important achievement of this model is that it is pre-trained on 104 different multilingual corpora and it performs quite well even in low-resource languages. In addition, the M-BERT model performs training taking into account the structures of all languages [37]. In this study, a pre-trained M-BERT model which supports 104 languages including Turkish with 12 stacked Transformer blocks, hidden dimensions 768, 12 self-attention heads, and overall 110,000,000 parameters was used.…”
Homophobic expressions are a form of insulting the sexual orientation or personality of people. Severe psychological traumas may occur in people who are exposed to this type of communication. It is important to develop automatic classification systems based on language models to examine social media content and distinguish homophobic discourse. This study aims to present a pre-trained Multilingual Bidirectional Encoder Representations from Transformers (M-BERT) model that can successfully detect whether Turkish comments on social media contain homophobic or related hate comments (i.e., sexist, severe humiliation, and defecation expressions). Comments in the Homophobic-Abusive Turkish Comments (HATC) dataset were collected from Instagram to train the detection models. The HATC dataset was manually labeled at the sentence level and combined with the Abusive Turkish Comments (ATC) dataset that has developed in our previous study. The HATC dataset has been balanced using the resampling method and two forms of the dataset (i.e., resHATC and original HATC) were used in the experiments. Afterward, the M-BERT model was compared with DL-based models (i.e., Long-Short Term Memory, Bidirectional Long-Short Term Memory (BiLSTM), Gated Recurrent Unit), Traditional Machine Learning (TML) classifiers (i.e., Support Vector Machine, Naive Bayes, Random Forest) and Ensemble Classifiers (i.e., Adaptive Boosting, eXtreme Gradient Boosting, Gradient Boosting) for the best model selection. The performance of the detection models was evaluated using F1-score, precision, and recall performance metrics. Results showed the best performance (homophobic F1-score: 82.64%, hateful F1-score: 91.75%, neutral F1-score: 96.08%, average F1-score: 90.15%) was achieved with the M-BERT model on the HATC dataset. The M-BERT detection model can increase the effectiveness of filters in detecting Turkish homophobic and related hate speech in social networks. It can be used to detect homophobic and related hate speech for different languages since the M-BERT model has multilingual pre-trained data.
“…Comparative investigations demonstrate that the bidirectional LSTM-connected conditional random field (CRF) model outperforms the LSTMconnected conditional random field (CRF) model. Existing event extraction methods [16,17], usually for news and other corpora, mainly rely on trigger words to detect certain events and then extract relevant event parameters, which are not suitable for unstructured personnel resume texts [18]. Author [19] proposed that event types can be detected through the critical parameters in the event, without relying on trigger words to see possibilities and extract event parameters.…”
Current methods for extracting information from user resumes do not work well with unstructured user resumes in economic announcements, and they do not work well with documents that have the same users in them. Unstructured user information is turned into structured user information templates in this study. It also proposes a way to build person relationship graphs in the field of economics. First, the lightweight blockchain-based BERT model (B-BERT) is trained. The learned B-BERT pretraining model is then utilized to get the event instance vector, categorize it appropriately, and populate the hierarchical user information templates with accurate user characteristics. The aim of this research is that it has investigated the approach of creating character connection graphs in the Chinese financial system and suggests a framework for doing so in the economic sector. Furthermore, the relationship between users is found through the filled-in user information template, and a graph of user relationships is made. This is how it works: finally, the experiment is checked by filling in a manually annotated dataset. In tests, the method can be used to get text information from unstructured economic user resumes and build a relationship map of people in the financial field. The experimental results show that the proposed approach is capable of efficiently retrieving information from unstructured financial personnel resume text and generating a character relationship graph in the economic sphere.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.