Multimodal learning is a framework for building models that make predictions based on different types of modalities. Important challenges in multimodal learning are the inference of shared representations from arbitrary modalities and cross-modal generation via these representations; however, achieving this requires taking the heterogeneous nature of multimodal data into account.In recent years, deep generative models, i.e., generative models in which distributions are parameterized by deep neural networks, have attracted much attention, especially variational autoencoders, which are suitable for accomplishing the above challenges because they can consider heterogeneity and infer good representations of data. Therefore, various multimodal generative models based on variational autoencoders, called multimodal deep generative models, have been proposed in recent years. In this paper, we provide a categorized survey of studies on multimodal deep generative models.
Lacto-N-tetraose (Galbeta1 -3GlcNAcbeta1-3Galbeta1-4Glc, LNT) and lacto-N-neotetraose (Galbeta1-4GlcNAcbeta1-3Galbeta1-4Glc, LNnT) were enzymatically synthesized by consecutive additions of GlcNAc and Gal residues to lactose. Lacto-N-triose II (GlcNAcbeta1-3Galbeta1-4Glc) was prepared first by the transfer of GlcNAc from UDP-GlcNAc to lactose by beta-1,3-N-acetylglucosaminyltransferase from bovine serum. The resulting lacto-N-triose II was converted into LNT and LNnT utilizing two kinds of beta-D-galactosidase-mediated transglycosylations. Thus, beta-D-galactosidase from Bacillus circulans ATCC31382 induced regioselective galactosyl transfer from o-nitrophenyl beta-D-galactoside to the OH-3" position of lacto-N-triose II, and commercially available beta-D-galactosidase from B. circulans to the OH-4" position of lacto-N-triose II. These convenient processes are suitable for large-scale preparations of LNT and LNnT. As another method, LNT was directly synthesized from lactose as an initial substance, utilizing lacto-N-biosidase (Aureobacterium sp. L-101)-mediated transglycosylation with Galbeta1-3GlcNAcbeta-pNP donor.
This paper describes a framework for the development of an integrative cognitive system based on probabilistic generative models (PGMs) called Neuro-SERKET. Neuro-SERKET is an extension of SERKET, which can compose elemental PGMs developed in a distributed manner and provide a scheme that allows the composed PGMs to learn throughout the system in an unsupervised way. In addition to the head-to-tail connection supported by SERKET, Neuro-SERKET supports tail-totail and head-to-head connections, as well as neural network-based modules, i.e., deep generative models. As an example of a Neuro-SERKET application, an integrative model was developed by composing a variational autoencoder (VAE), a Gaussian mixture model (GMM), latent Dirichlet allocation (LDA), and automatic speech recognition (ASR). The model is called VAE + GMM + LDA + ASR. The performance of VAE + GMM + LDA + ASR and the validity of Neuro-SERKET were demonstrated through a multimodal categorization task using image data and a speech signal of numerical digits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.