As concerns about social bots online increase, studies have attempted to explore the discourse they produce, and its effects on individuals and the public at large. We argue that the common reliance on aggregated scores of binary classifiers for bot detection may have yielded biased or inaccurate results. To test this possibility, we systematically compare the differences between non-bots and bots using binary and non-binary classifiers (classified into the categories of astroturf, self-declared, spammers, fake followers, and Other). We use two Twitter corpora, about COVID-19 vaccines ( N = 1,697,280) and climate change ( N = 1,062,522). We find that both in terms of volume and thematic content, the use of binary classifiers may hinder, distort, or mask differences between humans and bots, that could only be discerned when observing specific bot types. We discuss the theoretical and practical implications of these findings.