User requirement specification (URS) documents written in the form of free-form natural language text contain system use-case descriptions as one of the elements in the URS. For a few application domains, some of the system use-cases in SRS define services and functionality which needs to comply with law, rules and regulations pertaining to the application domain. In this paper, we present a multi-step approach to automatically extract system use-cases from URS and construct traceability links between system-uses and appropriate regulations in the regulatory documents. We define lexicon-based, syntactic and semantic features to discriminate system use-cases from other elements in the SRS. We investigate the application of five semantic similarity methods implemented in the SEMILAR semantic similarity toolkit to compute similarity between a given system usecase with regulations in a regulatory document. We conduct a series of experiments on real-world data obtained from software projects of a large global Information Technology (IT) services company to validate the proposed approach. Experimental results demonstrate effectiveness (accuracy of 83.3% for system use-case extraction and 72% for constructing traceability links) and limitations of the proposed approach.
RESEARCH MOTIVATION AND AIMSoftware applications and information systems providing services to the users and supporting business processes need to comply with the regulations related to the services and business processes supported by them [2][6][11] [10] [7][8] [9][12][13]. For example, information systems in the healthcare domain need to comply with Health Insurance Portability and Accountability Act (HIPAA 1 ) and applications in certain financial domain need to comply with the SarbanesOxley Act 2 . The need of software application compliance to regulations requires eliciting and addressing regulations related functional and non-functional requirements and also maintaining traceability of specific laws with specific elements in the software artifact due to regulatory changes [2][6][11] [10][7][8] [9] [12] [13]. Identification of elements within a software to specific regulations and maintaining the traceability links (focus of the work presented in this paper) as the system evolves is a non-trivial problem in the context of large and complex software systems. Manual process of uncovering traceability links between software artifacts and regulatory documents is not scalable, is tedious and error-prone due to the large size and complexity of the software as well as the regulations. Automatic traceability link recovery (compliance checking between software artifacts and regulatory documents) poses several technical challenges due to factors such as natural language text, terminology mismatches between software domain and legal domain and ensuring adaptability to regular amendments and revisions in regulations. Compliance checking and verification and traceability link recovery between software artifacts and regulatory documents is an area that has attrac...