Payam Karisani scite author profile

Opinion leaders are the influential people who are able to shape the minds and thoughts of other people in their society. Finding opinion leaders is an important task in various domains ranging from marketing to politics. In this paper, a new effective algorithm for finding opinion leaders in a given domain in online social networks is introduced. The proposed algorithm, named OLFinder, detects the main topics of discussion in a given domain, calculates a competency and a popularity score for each user in the given domain, then calculates a probability for being an opinion leader in that domain by using the competency and the popularity scores and finally ranks the users of the social network based on their probability of being an opinion leader. Our experimental results show that OLFinder outperforms other methods based on precision-recall, average precision and P@N measures.

show abstract

A query term re-weighting approach using document similarity

Karisani

Rahgozar

Oroumchian

2016

Information Processing & Management

View full text Add to dashboard Cite

Domain-Guided Task Decomposition with Self-Training for Detecting Personal Events in Social Media

Karisani

Agichtein

2020

View full text Add to dashboard Cite

Semi-Supervised Text Classification via Self-Pretraining

Karisani

2021

View full text Add to dashboard Cite

We present a neural semi-supervised learning model termed Self-Pretraining. Our model is inspired by the classic self-training algorithm. However, as opposed to self-training, Self-Pretraining is threshold-free, it can potentially update its belief about previously labeled documents, and can cope with the semantic drift problem. Self-Pretraining is iterative and consists of two classifiers.In each iteration, one classifier draws a random set of unlabeled documents and labels them. This set is used to initialize the second classifier, to be further trained by the set of labeled documents. The algorithm proceeds to the next iteration and the classifiers' roles are reversed. To improve the flow of information across the iterations and also to cope with the semantic drift problem, Self-Pretraining employs an iterative distillation process, transfers hypotheses across the iterations, utilizes a two-stage training model, uses an efficient learning rate schedule, and employs a pseudo-label transformation heuristic. We have evaluated our model in three publicly available social media datasets. Our experiments show that Self-Pretraining outperforms the existing state-of-the-art semisupervised classifiers across multiple settings. Our code is available at https://github.com/p-karisani/self_pretraining.

show abstract

Did You Really Just Have a Heart Attack?

Karisani

Agichtein

2018

View full text Add to dashboard Cite

Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

Karisani

Qin

Agichtein

2018

View full text Add to dashboard Cite

The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie

show abstract

Tweet Expansion Method for Filtering Task in Twitter

Karisani

Oroumchian

Rahgozar

2015

View full text Add to dashboard Cite

Abstract. In this article we propose a supervised method for expanding tweet contents to improve the recall of tweet filtering task in online reputation management systems. Our method does not use any external resources. It consists of creating a K-NN classifier in three steps. In these steps the tweets labeled related and unrelated in the training set are expanded by extracting and adding the most discriminative terms, calculating and adding the most frequent terms, and re-weighting the original tweet terms from training set. Our experiments in RepLab 2013 data set show that our method improves the performance of filtering task, in terms of F criterion, up to 13% over state-of-the-art classifiers such as SVM. This data set consists of 61 entities from different domains of automotive, banking, universities, and music. IntroductionTwitter is one of the widely used social networks in the world. According to reports 1 as of February 2015, Twitter had 288 million users. This large number of users, has made this website to be one of the most studied social networks in computer science [1][2][3]. On Twitter website users can post their messages in less than 140 characters; then their followers can read and re-tweet these messages. The huge source of information is spread in Twitter and other social networks every day; this has caused the emergence of Online Reputation Management systems (ORM.) ORM is about monitoring the Internet users' opinions regarding organizations, products, or celebrities [4]. The main tasks of ORM systems are retrieving the messages posted by users, analyzing the messages, and visualizing the results [3]. An important step in ORM is detecting the messages that are related to a specific entity; in other words, classifying messages based on their context. This step is known as the filtering task. If this step is carried out properly, it will result in reduction of noise and one could expect a higher quality of results. This task is quite challenging due to the ambiguity in the name of entities and the short length of messages. For 1 http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/ 56 P. Karisani et al.instance, if an ORM system wants to analyze users' impression of BMW Company, it must be able to recognize the tweets that contain this name (or other related names.) However, this is not an easy task because users may also abbreviate other phrases to BMW. For example, 90s TV series "Boy Meet World" is also abbreviated to BMW in tweets due to the constraints on the message length. Therefore, more sophisticated methods than simple keyword matching are required to carry out this step correctly.The short length of messages is the main challenge of applying regular classification and disambiguation techniques for tweet filtering [3]. In this research, we propose a supervised method to address this problem through tweet expansion. We expand the content of each tweet with more related words in order to increase the accuracy of matching tweets with keywords. Although we onl...

show abstract

View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data

Karisani

Choi

Xiong

2021

Preprint

View full text Add to dashboard Cite

We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in the other view. Finally, the initialized classifier in each view is further trained using the initial training examples. We evaluated our model in the largest publicly available ADR dataset. The experiments testify that our model significantly outperforms the transformer-based models pretrained on domain-specific data.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Payam Karisani

OLFinder: Finding opinion leaders in online social networks

A query term re-weighting approach using document similarity

Domain-Guided Task Decomposition with Self-Training for Detecting Personal Events in Social Media

Semi-Supervised Text Classification via Self-Pretraining

Did You Really Just Have a Heart Attack?

Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

Tweet Expansion Method for Filtering Task in Twitter

View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data

Contact Info

Product

Resources

About