We present Document Explorer, a data mining system searching for patterns in document collections. These patterns provide knowledge on the application domain that is represented by the collection. A pattern can also be seen as a query that retrieves a set of documents. Thus the data mining tools can be used to identify interesting queries which can be used to browse the collection. The main pattern types, the system can search for, are frequent sets of concepts, association rules, concept distributions, and concept graphs. To enable the user to specify some explicit bias, the system provides several types of constraints for searching the vast implicit spaces of patterns that exist in the collection. The patterns which have been verified as interesting are structured and presented in a visual user interface allowing the user to operate on the results to refine and redirect search tasks or to access the associated documents. The system offers preprocessing tools to construct or refine a knowledge base of domain concepts and to create an internal representation of the document collection that will be used by all subsequent data mining operations. In this paper, we give an overview on the Document Explorer system. We summarize our methodical approaches and solutions for the special requirements of this document mining area.
We analyze the influence of hearth location and smoke dispersal on potential activity areas at Lower Paleolithic Lazaret Cave, France, focusing on archaeostratigraphic unit UA25, where a single hearth was unearthed, and GIS and activity area analysis were performed by the excavators. We simulated smoke dispersal from 16 hypothetical hearth locations and analyzed their effect on potential working spaces. Four activity zones were defined, according to the average smoke exposure recommendations from the World Health Organization (WHO) and Environmental Protection Agency (EPA). We found that the size of the low smoke density area and its distance from the hearth are the main parameters for choosing hearth location. The simulation results show an optimal hearth location zone of about 5 × 5m2, and it is precisely in this zone that the Lower Paleolithic humans of Lazaret Cave placed their hearth. We demonstrate that the optimal hearth location zone correlates not only with the archaeological hearth in UA25 but also with the locations of hearths in other layers. In addition, our smoke density analysis confirmed the detailed GIS and activity area reconstruction conducted by the excavators, strongly reinforcing their interpretation regarding the spatial organization of human behavior at Lazaret Cave.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.