Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for feature learning via solving auxiliary tasks. However, existing methods in this regard suffer from two limitations when extended to video classification. First, they focus only on a single task, whereas ignoring complementarity among different task-specific features and thus resulting in suboptimal video representation. Second, high computational and memory cost hinders their application in real-world scenarios. In this paper, we propose a graph-based distillation framework to address these problems: (1) We propose logits graph and representation graph to transfer knowledge from multiple self-supervised tasks, where the former distills classifier-level knowledge by solving a multi-distribution joint matching problem, and the latter distills internal feature knowledge from pairwise ensembled representations with tackling the challenge of heterogeneity among different features; (2) The proposal that adopts a teacherstudent framework can reduce the redundancy of knowledge learnt from teachers dramatically, leading to a lighter student model that solves classification task more efficiently. Experimental results on 3 video datasets validate that our proposal not only helps learn better video representation but also compress model for faster inference.
Hepatocellular carcinoma (HCC) is one of the most prevalent human malignancies worldwide and has high morbidity and mortality. Elucidating the molecular mechanisms underlying HCC recurrence and metastasis is critical to identify new therapeutic targets. This study aimed to determine the roles of aminopeptidase N (APN, also known as CD13) in HCC proliferation and metastasis and its underlying mechanisms. We detected APN expression in clinical samples and HCC cell lines using immunohistochemistry, flow cytometry, real-time PCR, and enzyme activity assays. The effects of APN on HCC metastasis and proliferation were verified in both in vitro and in vivo models. RNA-seq, phosphoproteomic, western blot, point mutation, co-immunoprecipitation, and proximity ligation assays were performed to reveal the potential mechanisms. We found that APN was frequently upregulated in HCC tumor tissues and high-metastatic cell lines. Knockout of APN inhibited HCC cell metastasis and proliferation in vitro and in vivo. Functional studies suggested that a loss of APN impedes the ERK signaling pathway in HCC cells. Mechanistically, we found that APN might mediate the phosphorylation at serine 31 of BCKDK (BCKDKS31), promote BCKDK interacting with ERK1/2 and phosphorylating it, thereby activating the ERK signaling pathway in HCC cells. Collectively, our findings indicate that APN mediates the phosphorylation of BCKDKS31 and activates its downstream pathway to promote HCC proliferation and metastasis. Therefore, the APN/BCKDK/ERK axis may serve as a new therapeutic target for HCC therapy, and these findings may be helpful to identify new biomarkers in HCC progression.
Zero-Shot Learning (ZSL) in video classification is a promising research direction, which aims to tackle the challenge from explosive growth of video categories. Most existing methods exploit seento-unseen correlation via learning a projection between visual and semantic spaces. However, such projection-based paradigms cannot fully utilize the discriminative information implied in data distribution, and commonly suffer from the information degradation issue caused by "heterogeneity gap". In this paper, we propose a visual data synthesis framework via GAN to address these problems. Specifically, both semantic knowledge and visual distribution are leveraged to synthesize video feature of unseen categories, and ZSL can be turned into typical supervised problem with the synthetic features. First, we propose multi-level semantic inference to boost video feature synthesis, which captures the discriminative information implied in joint visual-semantic distribution via feature-level and label-level semantic inference. Second, we propose Matching-aware Mutual Information Correlation to overcome information degradation issue, which captures seen-to-unseen correlation in matched and mismatched visual-semantic pairs by mutual information, providing the zero-shot synthesis procedure with robust guidance signals. Experimental results on four video datasets demonstrate that our approach can improve the zero-shot video classification performance significantly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.