Over the past decade, we have witnessed the rise of misinformation on the Internet, with online users constantly falling victims of fake news. A multitude of past studies have analyzed fake news diffusion mechanics and detection and mitigation techniques. However, there are still open questions about their operational behavior such as: How old are fake news websites? Do they typically stay online for long periods of time? Do such websites synchronize with each other their up and down time? Do they share similar content through time?Which third-parties support their operations? How much user traffic do they attract, in comparison to mainstream or real news websites?In this paper, we perform a first of its kind investigation to answer such questions regarding the online presence of fake news websites and characterize their behavior in comparison to real news websites. Based on our findings, we build a content-agnostic ML classifier for automatic detection of fake news websites (i.e., F1 score up to 0.942 and AUC of ROC up to 0.976) that are not yet included in manually curated blacklists.
INTRODUCTIONLots of things we read online may appear to be true, often is not. False information is news, stories or hoaxes created to deliberately misinform or deceive readers. Usually, these stories are created to (i) either lure users and become a profitable business for the publishers (i.e., clickbait) or (ii) influence people's views, push a political agenda or cause confusion.False information can deceive people by looking like trusted websites or using similar names and web addresses to reputable news organisations.The most important types of fake news include: (a) Clickbait: stories that are carefully fabricated to gain more website visitors and drive advertising revenues for publishers. Such stories use sensationalist headlines to grab attention and increase Click-through rates normally at the expense of truth or accuracy. (b) Propaganda: stories that are created to deliberately mislead audiences, promote a biased point of view or particular political agenda. (c) Sloppy Journalism: stories with unreliable information or without verified facts which can mislead audiences. (d) Misleading Headings: stories that are not completely false but distorted using misleading or sensationalist headlines, in such a way that can spread quickly via social media sites where only headlines and small snippets of the full article are displayed on audience newsfeeds. (e) Satire: stories for entertainment and parody (e.g., the Onion, The Daily Mash, etc.).An analysis [1] found that on Facebook, the top 20 fake news stories about the 2016 U.S. presidential election received more engagement than the top 20 election stories from 19 major media outlets. According to a different study [2], US citizens rate fake news as a larger problem than racism, climate change, or terrorism. According to the study, more than making people believe false things, the rise of fake news is making it harder for people to recognize the truth, thus, making them especial...