Deep Neural Networks (DNNs) have powerful recognition abilities to classify different objects. Although the models of DNNs can reach very high accuracy even beyond human level, they are regarded as black boxes that are absent of interpretability. In the training process of DNNs, abstract features can be automatically extracted from high-dimensional data, such as images. However, the extracted features are usually mapped into a representation space that is not aligned with human knowledge. In some cases, the interpretability is necessary, e.g. medical diagnoses. For the purpose of aligning the representation space with human knowledge, this paper proposes a kind of DNNs, termed as Conceptual Alignment Deep Neural Networks (CADNNs), which can produce interpretable representations in the hidden layers. In CADNNs, some hidden neurons are selected as conceptual neurons to extract the human-formed concepts, while other hidden neurons, called free neurons, can be trained freely. All hidden neurons will contribute to the final classification results. Experiments demonstrate that the CADNNs can keep up with the accuracy of DNNs, even though CADNNs have extra constraints of conceptual neurons. Experiments also reveal that the free neurons could learn some concepts aligned with human knowledge in some cases.
Traditional Chinese medicine (TCM) is found on a long‐term medical practice in China. Rare human brains can fully grasp the deep TCM knowledge derived from a tremendous amount of experience. In this big data era, a big electronic brain might be competent via deep learning techniques. For this prospect, the electronic brain needs to process various heterogeneous data, such as images, texts, audio signals, and other sensory data. It used to be a challenge to analyze the heterogeneous data by the computer‐aided system until the advances of the powerful deep learning tools. We propose a multimodal deep learning framework to mimic a TCM practitioner to diagnose a patient on the basis of multimodal perceptions of see, listen, smell, ask, and touch. The framework learns common representations from various high‐dimensional sensory data, and fuse the information for final classification. We propose to use conceptual alignment deep neural networks to embed prior knowledge and obtain interpretable latent representations. We implement a multimodal deep architecture to process tongue image and description text data for TCM diagnosis. Experiments illustrate that the multimodal deep architecture can extract effective features from heterogeneous data, produce interpretable representations, and finally achieve a higher accuracy than either corresponding unimodal architectures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.