“…Much of this work relies on the ability to extract entities accurately, including work focused on modeling (Bamman et al, 2014;Iyyer et al, 2016;Chaturvedi et al, 2017). And yet, with notable exceptions (Vala et al, 2015;Brooke et al, 2016), nearly all of this work tends to use NER models that have been trained on non-literary data, for the simple reason that labeled data exists for domains like news through standard datasets like ACE (Walker et al, 2006), CoNLL (Tjong Kim Sang and De Meulder, 2003) and OntoNotes (Hovy et al, 2006)-and even historical non-fiction (De-Lozier et al, 2016;Rayson et al, 2017)-but not for literary texts. This is naturally problematic for several reasons: models trained on out-of-domain data surely degrade in performance when applied to a very different domain, and especially for NER, as Augenstein et al (2017) has shown; and without indomain test data, it is difficult to directly estimate the severity of this degradation.…”