Crowdfunding sites with recent explosive growth are equally attractive platforms for swindlers or scammers. Though the growing number of articles on crowdfunding scams indicate that the fraud threats are accelerating, there has been little knowledge on the scamming practices and patterns. The key contribution of this research is to discover the hidden clues in the text by exploring linguistic features to distinguish scam campaigns from non-scams. Our results indicate that by providing less information and writing more carefully (and less informally), scammers deliberately try to deceive people; (i) they use less number of words, verbs, and sentences in their campaign pages. (ii) scammers make less typographical errors, 4.5-4.7 times lower than non-scammers.(iii) Expressivity of scams is 2.6-8.5 times lower as well.
Crowdfunding has seen an enormous rise, becoming a new alternative funding source for emerging companies or new startups in recent years. As crowdfunding prevails, it is also under substantial risk of the occurrence of fraud. Though a growing number of articles indicate that crowdfunding scams are a new imminent threat to investors, little is known about them primarily due to the lack of measurement data collected from real scam cases. This paper fills the gap by collecting, labeling, and analyzing publicly available data of a hundred fraudulent campaigns on a crowdfunding platform. In order to find and understand distinguishing characteristics of crowdfunding scams, we propose to use a broad range of traits including project-based traits, project creator-based ones, and content-based ones such as linguistic cues and Named Entity Recognition features, etc. We then propose to use the feature selection method called Forward Stepwise Logistic Regression, through which 17 key discriminating features (including six original and hitherto unused ones) of scam campaigns are discovered. Based on the selected 17 key features, we present and discuss our findings and insights on distinguishing characteristics of crowdfunding scams, and build our scam detection model with 87.3% accuracy. We also explore the feasibility of early scam detection, building a model with 70.2% of classification accuracy right at the time of project launch. We discuss what features from which sections are more helpful for early scam detection on day 0 and thereafter.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.