Catalina Vajiac scite author profile

Given a million escort advertisements, how can we spot near-duplicates? Such micro-clusters of ads are usually signals of human trafficking. How can we summarize them to convince law enforcement to act? Spotting micro-clusters of near-duplicate documents is useful in multiple, additional settings, including spam-bot detection in Twitter ads, plagiarism, and more. We present InfoShield , which makes the following contributions: (a) Practical , being scalable and effective on real data, (b) Parameter-free and Principled , requiring no user-defined parameters, (c) Interpretable , finding a document to be the cluster representative, highlighting all the common phrases, and automatically detecting “slots”, i.e. phrases that differ in every document; and (d) Generalizable , beating or matching domain-specific methods in Twitter bot detection and human trafficking detection respectively, as well as being language-independent. Interpretability is particularly important for the anti human-trafficking domain, where law enforcement must visually inspect ads. Our experiments on real data show that InfoShield correctly identifies Twitter bots with an F1 score over 90% and detects human-trafficking ads with 84% precision. Moreover, it is scalable, requiring about 8 hours for 4 million documents on a stock laptop. Our incremental version, DeltaShield , allows for fast, incremental updates, with minor loss of accuracy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Catalina Vajiac

INFOSHIELD: Generalizable Information-Theoretic Human-Trafficking Detection

Synchronous Hyperedge Replacement Graph Grammars

VisPaD: Visualization and Pattern Discovery for Fighting Human Trafficking

TRAFFICVIS: Visualizing Organized Activity and Spatio-Temporal Patterns for Detecting and Labeling Human Trafficking

DeltaShield: Information Theory for Human- Trafficking Detection

Contact Info

Product

Resources

About