Traditional software engineering programming paradigms are mostly object or procedure oriented, driven by deterministic algorithms. With the advent of deep learning and cognitive sciences there is an emerging trend for data-driven programming, creating a shift in the programming paradigm among the software engineering communities. Visualizing and interpreting the execution of a current large scale data-driven software development is challenging. Further, for deep learning development there are many libraries in multiple programming languages such as TensorFlow (Python), CAFFE (C++), Theano (Python), Torch (Lua), and Deeplearning4j (Java), driving a huge need for interoperability across libraries.We propose a model driven development based solution framework, that facilitates intuitive designing of deep learning models in a platform agnostic fashion. This framework could potentially generate library specific code, perform program translation across languages, and debug the training process of a deep learning model from a fault localization and repair perspective.Further we identify open research problems in this emerging domain, and discuss some new software tooling requirements to serve this new age data-driven programming paradigm.
In Sanskrit, small words (morphemes) are combined to form compound words through a process known as Sandhi. Sandhi splitting is the process of splitting a given compound word into its constituent morphemes. Although rules governing word splitting exists in the language, it is highly challenging to identify the location of the splits in a compound word. Though existing Sandhi splitting systems incorporate these pre-defined splitting rules, they have a low accuracy as the same compound word might be broken down in multiple ways to provide syntactically correct splits.In this research, we propose a novel deep learning architecture called Double Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95% accuracy, and (ii) predicts the constituent words (learning the Sandhi splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%. Additionally, we show the generalization capability of our deep learning model, by showing competitive results in the problem of Chinese word segmentation, as well.1 Different syntactic splits given by one of the popular Sandhi splitters:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.