We present an approach named JSFusion (Joint Sequence Fusion) that can measure semantic similarity between any pairs of multimodal sequence data (e.g. a video clip and a language sentence). Our multimodal matching network consists of two key components. First, the Joint Semantic Tensor composes a dense pairwise representation of two sequence data into a 3D tensor. Then, the Convolutional Hierarchical Decoder computes their similarity score by discovering hidden hierarchical matches between the two sequence modalities. Both modules leverage hierarchical attention mechanisms that learn to promote well-matched representation patterns while prune out misaligned ones in a bottom-up manner. Although the JSFusion is a universal model to be applicable to any multimodal sequence data, this work focuses on video-language tasks including multimodal retrieval and video QA. We evaluate the JS-Fusion model in three retrieval and VQA tasks in LSMDC, for which our model achieves the best performance reported so far. We also perform multiple-choice and movie retrieval tasks for the MSR-VTT dataset, on which our approach outperforms many state-of-the-art methods.
We propose a high-level concept word detector that can be integrated with any video-to-language models. It takes a video as input and generates a list of concept words as useful semantic priors for language generation models. The proposed word detector has two important properties. First, it does not require any external knowledge sources for training. Second, the proposed word detector is trainable in an end-to-end manner jointly with any video-to-language models. To effectively exploit the detected words, we also develop a semantic attention mechanism that selectively focuses on the detected concept words and fuse them with the word encoding and decoding in the language model. In order to demonstrate that the proposed approach indeed improves the performance of multiple video-to-language tasks, we participate in all the four tasks of LSMDC 2016 [22]. Our approach has won three of them, including fill-in-theblank, multiple-choice test, and movie retrieval.
In this paper, we address the problem of jointly summarizing large sets of Flickr images and YouTube videos. Starting from the intuition that the characteristics of the two media types are different yet complementary, we develop a fast and easily-parallelizable approach for creating not only high-quality video summaries but also novel structural summaries of online images as storyline graphs. The storyline graphs can illustrate various events or activities associated with the topic in a form of a branching network. The video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we collect the datasets of 20 outdoor activities, consisting of 2.7M Flickr images and 16K YouTube videos. Due to the large-scale nature of our problem, we evaluate our algorithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other baselines and our own methods using videos or images only.
Pain-related neuropeptides released from synovial fibroblasts, such as substance P, have been implicated in joint destruction. Substance P-induced inflammatory processes are mediated via signaling through a G-protein-coupled receptor, that is, neurokinin-1 tachykinin receptor (NK(1)-R). We determined the pathophysiological link between substance P and its receptor in human adult articular cartilage homeostasis. We further examined if catabolic growth factors such as basic fibroblast growth factor (bFGF or FGF-2) or IL-1beta accelerate matrix degradation via a neural pathway upregulation of substance P and NK(1)-R. We show here that substance P stimulates the production of cartilage-degrading enzymes, such as matrix metalloproteinase-13 (MMP-13), and suppresses proteoglycan deposition in human adult articular chondrocytes via NK(1)-R. Furthermore, we have demonstrated that substance P negates proteoglycan stimulation promoted by bone morphogenetic protein-7, suggesting the dual role of substance P as both a pro-catabolic and anti-anabolic mediator of cartilage homeostasis. We report that bFGF-mediated stimulation of substance P and its receptor NK(1)-R is, in part, through an IL-1beta-dependent pathway.
Variational autoencoders (VAE) combined with hierarchical RNNs have emerged as a powerful framework for conversation modeling. However, they suffer from the notorious degeneration problem, where the decoders learn to ignore latent variables and reduce to vanilla RNNs. We empirically show that this degeneracy occurs mostly due to two reasons. First, the expressive power of hierarchical RNN decoders is often high enough to model the data using only its decoding distributions without relying on the latent variables. Second, the conditional VAE structure whose generation process is conditioned on a context, makes the range of training targets very sparse; that is, the RNN decoders can easily overfit to the training data ignoring the latent variables. To solve the degeneration problem, we propose a novel model named Variational Hierarchical Conversation RNNs (VHCR), involving two key ideas of (1) using a hierarchical structure of latent variables, and (2) exploiting an utterance drop regularization. With evaluations on two datasets of Cornell Movie Dialog and Ubuntu Dialog Corpus, we show that our VHCR successfully utilizes latent variables and outperforms state-of-the-art models for conversation generation. Moreover, it can perform several new utterance control tasks, thanks to its hierarchical latent structure.
Bardet Biedl syndrome (BBS) is a multisystem genetically heterogeneous ciliopathy that most commonly leads to obesity, photoreceptor degeneration, digit anomalies, genito-urinary abnormalities, as well as cognitive impairment with autism, among other features. Sequencing of a DNA sample from a 17-year-old female affected with BBS did not identify any mutation in the known BBS genes. Whole-genome sequencing identified a novel loss-of-function disease-causing homozygous mutation (K102*) in C8ORF37, a gene coding for a cilia protein. The proband was overweight (body mass index 29.1) with a slowly progressive rod-cone dystrophy, a mild learning difficulty, high myopia, three limb post-axial polydactyly, horseshoe kidney, abnormally positioned uterus and elevated liver enzymes. Mutations in C8ORF37 were previously associated with severe autosomal recessive retinal dystrophies (retinitis pigmentosa RP64 and cone-rod dystrophy CORD16) but not BBS. To elucidate the functional role of C8ORF37 in a vertebrate system, we performed gene knockdown in Danio rerio and assessed the cardinal features of BBS and visual function. Knockdown of c8orf37 resulted in impaired visual behavior and BBS-related phenotypes, specifically, defects in the formation of Kupffer's vesicle and delays in retrograde transport. Specificity of these phenotypes to BBS knockdown was shown with rescue experiments. Over-expression of human missense mutations in zebrafish also resulted in impaired visual behavior and BBS-related phenotypes. This is the first functional validation and association of C8ORF37 mutations with the BBS phenotype, which identifies BBS21. The zebrafish studies hereby show that C8ORF37 variants underlie clinically diagnosed BBS-related phenotypes as well as isolated retinal degeneration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.