Software Requirement Specification (SRS) describes a software system to be developed that captures the functional, non-functional, and technical aspects of the stakeholder's requirements. Retrieval and extraction of software information from SRS are essential to the development of software product line (SPL). Albeit Natural Language Processing (NLP) techniques, such as information retrieval and standard machine learning, have been advocated in the recent past as a semi-automatic means of optimising requirements specifications, they have not been widely embraced. The complexity in the organization's information makes requirement analysis intricately a challenging task. The interdependence of subsystems and within an organisation drives this complexity. A plain multi-class classification framework may not address this issue. Hence, this paper propounds an automated non-exclusive approach for classification of functional requirements from SRS, using a deep learning framework. Specifically, Word2Vec and FastText word embeddings are utilised for document representation for training a convolutional neural network (CNN). The study was carried out by the compilation of manually categorised relevant enterprise data (AUTomotive Open System ARchitecture (AUTOSAR)), which were also employed for model training. Over a convolutional neural network, the impact of data trained with Word2Vec and FastText word embeddings from SRS documentation were compared to pre-trained word embeddings models, available online.
Bilingual dictionaries are essential resources in many areas of natural language processing tasks, but resource-scarce and less popular language pairs rarely have such. Efficient automatic methods for inducting bilingual dictionaries are needed as manual resources and efforts are scarce for low-resourced languages. In this paper, we induce word translations using bilingual embedding. We use the Apache Spark ® framework for parallel computation. Further, to validate the quality of the generated bilingual dictionary, we use it in a phrase-table aided Neural Machine Translation (NMT) system. The system can perform moderately well with a manual bilingual dictionary; we change this into our inducted dictionary. The corresponding translated outputs are compared using the Bilingual Evaluation Understudy (BLEU) and Rank-based Intuitive Bilingual Evaluation Score (RIBES) metrics.
Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.