Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart.
Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are commonplace in the real world. For example, edges in e-commerce networks often indicate how users rated products and services in terms of number of stars, and edges in online social and phonecall networks contain temporal information about when friendships were formed and when users communicated with each other -in such cases, edge attributes capture information about how the adjacent nodes interact with other entities in the network. In this paper, we aim to utilize exactly this information to discern suspicious from typical node behavior. Our work has a number of notable contributions, including (a) formulation: while most other graph-based anomaly detection works use structural graph connectivity or node information, we focus on the new problem of leveraging edge information, (b) methodology: we introduce EDGECENTRIC, an intuitive and scalable compression-based approach for detecting edge-attributed graph anomalies, and (c) practicality: we show that EDGECEN-TRIC successfully spots numerous such anomalies in several large, edge-attributed real-world graphs, including the Flipkart e-commerce graph with over 3 million product reviews between 1.1 million users and 545 thousand products, where it achieved 0.87 precision over the top 100 results.
Health insurance costs across the world have increased alarmingly in recent years. A major cause of this increase are payment errors made by the insurance companies while processing claims. These errors often result in extra administrative effort to re-process (or rework) the claim which accounts for up to 30% of the administrative staff in a typical health insurer. We describe a system that helps reduce these errors using machine learning techniques by predicting claims that will need to be reworked, generating explanations to help the auditors correct these claims, and experiment with feature selection, concept drift, and active learning to collect feedback from the auditors to improve over time. We describe our framework, problem formulation, evaluation metrics, and experimental results on claims data from a large US health insurer. We show that our system results in an order of magnitude better precision (hit rate) over existing approaches which is accurate enough to potentially result in over $15-25 million in savings for a typical insurer. We also describe interesting research problems in this domain as well as design choices made to make the system easily deployable across health insurance companies.
Many practical data mining systems such as those for fraud detection and surveillance deal with building classifiers that are not autonomous but part of a larger interactive system with an expert in the loop. The goal of these systems is not just to maximize the performance of the classifier but to make the experts more efficient at performing their task, thus maximizing the overall Return on Investment of the system. This paper describes an interactive system for detecting payment errors in insurance claims with claim auditors in the loop. We describe an interactive claims prioritization component that uses an online cost-sensitive learning approach (more-like-this) to make the system efficient. Our interactive prioritization component is built on top of a batch classifier that has been trained to detect payment errors in health insurance claims and optimizes the interaction between the classifier and the domain experts who are consuming the results of this system. The goal is to make these auditors more efficient and effective as well as improving the classification performance of the system. The result is both a reduction in time it takes for the auditors to review and label claims as well as improving the precision of the system in finding payment errors. We show results obtained from applying this system at two major US health insurance companies indicating significant reduction in claim audit costs and potential savings of $20-$26 million/year making the insurance providers more efficient and lowering their operating costs. Our system reduces the money being wasted by providers and insurers dealing with incorrectly processed claims and makes the healthcare system more efficient.
Rating platforms enable large-scale collection of user opinion about items (e.g., products or other users). However, fraudulent users give fake ratings for excessive monetary gains. In this paper, we present Rev2, a system to identify such fraudulent users. We propose three interdependent intrinsic quality metrics-fairness of a user, reliability of a rating and goodness of a product. The fairness and reliability quantify the trustworthiness of a user and rating, respectively, and goodness quantifies the quality of a product. Intuitively, a user is fair if it provides reliable scores that are close to the goodness of products. We propose six axioms to establish the interdependency between the scores, and then, formulate a mutually recursive definition that satisfies these axioms. We extend the formulation to address cold start problem and incorporate behavior properties. We develop the Rev2 algorithm to calculate these intrinsic scores for all users, ratings, and products by combining network and behavior properties. We prove that this algorithm is guaranteed to converge and has linear time complexity. By conducting extensive experiments on five rating datasets, we show that Rev2 outperforms nine existing algorithms in detecting fraudulent users. We reported the 150 most unfair users in the Flipkart network to their review fraud investigators, and 127 users were identified as being fraudulent (84.6% accuracy). The Rev2 algorithm is being deployed at Flipkart.
Owing to the latest advancements in networking devices and functionalities, there is a need to build future intelligent networks that provide intellectualization, activation, and customization. Software-defined networks (SDN) are one of the latest and most trusted technologies that provide a method of network management that provides network virtualization. Although traditional networks still have a strong presence in the industry, software-defined networks have begun to replace them at faster rates. When network technologies emerge at a steady rate, SDN will be implemented at higher rates in the upcoming years in all fields. Although SDN technology removes the complexity of tying control and data plane together over traditional networks, certain aspects such as security, controllability, and economy of network resources are vulnerable. Among these aspects, security is one of the main concerns that are to be viewed seriously as far as the applications of SDN are concerned. This paper presents the most recent security issues SDN environment followed by preventive mechanisms. This study focuses on Internet control message protocol (ICMP) attacks in SDN networks. This study proposes a security policy protocol (SPP) to detect attacks that target devices such as switches and the SDN controller in the SDN networks. The mechanism is based on ICMP attacks, which are the main source of flooding attacks in the SDN networks. The proposed model focuses on two aspects: security policy process verification and client authentication verification. Experimental results shows that the proposed model can effectively defend against flooding attacks in SDN network environments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.