Imagine having the ability to search the entire canon of Western literature quickly and easily for the use of a specific metaphor, references to a particular place or instances of an exact sequence of words or phrases. Last year's publication in the journal Science of research using preliminary results from Google's book digitization project drew attention to the potential of such data mining for exploring a variety of fields in the social sciences and the humanities [1]. At its simplest, data mining is the process of extracting new knowledge (usually in terms of previously unknown patterns) from sets of data already in existence. For instance, Shakespeare scholars have used data mining techniques to identify patterns of word usage in his plays, the texts of which have already been digitized. Similarly, there is a long history of researchers making use of U.S. census data to identify demographic trends or correlations with other datasets. Data mining is inherently an exercise in quantitative analysis, the results of which are subject to qualitative analyses that link the newly discovered patterns back to particular, representative examples from the original set of data.In the humanities, data mining necessarily entails an interdisciplinary and collaborative practice because it combines tools, techniques and methodologies from computer science and the humanities. As a consequence, data mining is often associated with the term digital humanities, which includes using cutting edge technology both to present the results of research and to conduct the research itself. Data mining is one example of the latter, and at its best a data mining project involves active collaboration between humanities scholars and information professionals to design and carry out the research program. In addition, because data mining is a relatively recent practice, the research project is often as novel for the computer and information sciences as it is for the humanities. Therefore, data mining projects are driven as much by the information professional as the humanities scholar.
C O N T E N T S N E X T P A G E > N E X T A R T I C L E > < P R E V I O U S P A G E
Special SectionEDITOR'S SUMMARY Data mining offers the capability to view data in a new light, discovering associations and patterns not appreciated before. For the humanities domain, it exemplifies the interdisciplinary efforts of digital humanities. The technique provides answers and prompts further questions from new discoveries. Part of knowledge discovery in databases, data mining involves identifying relevant n-grams, classifying and reclassifying results, modeling the interdependence of variables and clustering results into meaningful subgroups. From designing research questions to determining how best to display and communicate results, the process requires collaboration between information professionals and humanities scholars. A selection of data mining projects illustrates how the technique is being applied for humanities research. Tools for data mining are readily avai...