“…AI/ML via general crypto or MPC [34,76,155,198,284,315,323,370,383] (total: 9) AI/ML or matching fully clientside [4,86,128,138,207,214,352,366,377] (total: 9) Metadata-based [58,176,262,368,384] (total: 5) Other [269,329,351] (total: 3)…”
Section: Spam Filteringmentioning
confidence: 99%
“…PHFs appear only in Reis et al's 2020 work for misinformation in WhatsApp that provides full client privacy [308] and the two 2021 partially client private proposals for matching CSAM [33,212]. A few more papers we examined use locality-sensitive hashes for identifying spam similar to previouslyseen spam [86,351,380], however aside from these PHFs are rare in the literature we examined. We hope to see both improved PHFs and improved scrutiny of PHFs in the future.…”
Popular messaging applications now enable end-to-end-encryption (E2EE) by default, and E2EE data storage is becoming common. These important advances for security and privacy create new content moderation challenges for online services, because services can no longer directly access plaintext content. While ongoing public policy debates about E2EE and content moderation in the United States and European Union emphasize child sexual abuse material and misinformation in messaging and storage, we identify and synthesize a wealth of scholarship that goes far beyond those topics. We bridge literature that is diverse in both content moderation subject matter, such as malware, spam, hate speech, terrorist content, and enterprise policy compliance, as well as intended deployments, including not only privacy-preserving content moderation for messaging, email, and cloud storage, but also private introspection of encrypted web traffic by middleboxes. In this work, we systematize the study of content moderation in E2EE settings. We set out a process pipeline for content moderation, drawing on a broad interdisciplinary literature that is not specific to E2EE. We examine cryptography and policy design choices at all stages of this pipeline, and we suggest areas of future research to fill gaps in literature and better understand possible paths forward.
“…AI/ML via general crypto or MPC [34,76,155,198,284,315,323,370,383] (total: 9) AI/ML or matching fully clientside [4,86,128,138,207,214,352,366,377] (total: 9) Metadata-based [58,176,262,368,384] (total: 5) Other [269,329,351] (total: 3)…”
Section: Spam Filteringmentioning
confidence: 99%
“…PHFs appear only in Reis et al's 2020 work for misinformation in WhatsApp that provides full client privacy [308] and the two 2021 partially client private proposals for matching CSAM [33,212]. A few more papers we examined use locality-sensitive hashes for identifying spam similar to previouslyseen spam [86,351,380], however aside from these PHFs are rare in the literature we examined. We hope to see both improved PHFs and improved scrutiny of PHFs in the future.…”
Popular messaging applications now enable end-to-end-encryption (E2EE) by default, and E2EE data storage is becoming common. These important advances for security and privacy create new content moderation challenges for online services, because services can no longer directly access plaintext content. While ongoing public policy debates about E2EE and content moderation in the United States and European Union emphasize child sexual abuse material and misinformation in messaging and storage, we identify and synthesize a wealth of scholarship that goes far beyond those topics. We bridge literature that is diverse in both content moderation subject matter, such as malware, spam, hate speech, terrorist content, and enterprise policy compliance, as well as intended deployments, including not only privacy-preserving content moderation for messaging, email, and cloud storage, but also private introspection of encrypted web traffic by middleboxes. In this work, we systematize the study of content moderation in E2EE settings. We set out a process pipeline for content moderation, drawing on a broad interdisciplinary literature that is not specific to E2EE. We examine cryptography and policy design choices at all stages of this pipeline, and we suggest areas of future research to fill gaps in literature and better understand possible paths forward.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.