Forensic analysis requires a keen detective mind, but the human mind has neither the abihty nor the time to process the millions of bytes on a typical computer hard disk. Digital forensic investigators need powerful tools that can automate many of the analysis tasks that are currently being performed manually. This paper argues that forensic analysis can greatly benefit from research in knowledge discovery and data mining, which has developed powerful automated techniques for analyzing massive quantities of data to discern novel, potentially useful patterns. We use the term "evidence mining" to refer to the apphcation of these techniques in the analysis phase of digital forensic investigations. This paper presents a novel approach involving the specialization of CRISP-DM, a cross-industry standard process for data mining, to CRISP-EM, an evidence mining methodology designed specifically for digital forensics. In addition to supporting forensic analysis, the CRISP-EM methodology off'ers a structured approach for defining the research gaps in evidence mining.
Most actionable evidence is identified during the analysis phase of digital forensic investigations. Currently, the analysis phase uses expressionbased searches, which assume a good understanding of the evidence; but latent evidence cannot be found using such methods. Knowledge discovery and data mining (KDD) techniques can significantly enhance the analysis process. A promising KDD technique is topic modeling, which infers the underlying semantic context of text and summarizes the text using topics described by words. This paper investigates the application of topic modeling to forensic data and its ability to contribute to the analysis phase. Also, it highlights the challenges that forensic data poses to topic modeling algorithms and reports on the lessons learned from a case study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.