Summary
Google, Facebook, OpenAI, and others have released access to versions of language chatbots that they have developed. These chatbots have been trained on massive amounts of text using neural networks for language processing. Using an approach similar to security penetration testing, this paper investigates and compares three different chatbots, assessing potential strengths and limitations of these systems. The paper presents several findings, including a comparison of those systems across answers to common questions, an analysis of the use of names and activities to guide discussion in two systems, an analysis of the extent of differences in responses arising from “regeneration” of a question, the determination of a weakness in a system of knowing “who” invented something, development of a potential new subfield, sensitive topic classifiers, and an analysis of some of the implications of these findings. As part of this analysis, I find emerging topics in chatbots, such as “topic stalemate” and the use of sensitive topic classifiers.