In this work we identify interactions of stopwords -noisewords-in text corpora as a fundamental feature to effect author classification. It is convenient to view such interactions as graphs wherein nodes are stopwords and the interaction between a pair of stopwords are represented as edge-weights. We define the interaction in terms of the distances between pairs of stopwords in text documents. Given a list of authors, graphs for each author is computed based on their undisputed writings. Authorship of a test document is attributed based on the closeness of the graph derived from it to the above graphs. Towards this, we define a closeness measure to compare such graphs based on the Kullback-Leibler divergence. We illustrate the accuracy of our approach by applying it on examples drawn from the Gutenberg archives. Our results show that the proposed approach is effective not only in binary author classification but also performs multiclass author classification for as many as 10 authors at a time and compares favourably with the state-of-the-art in author identification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.