This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.
The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain. Traditionally, label noise has been treated as statistical outliers, and techniques such as importance re-weighting and bootstrapping have been proposed to alleviate the problem. According to our observation, the real-world noisy labels exhibit multimode characteristics as the true labels, rather than behaving like independent random outliers. In this work, we propose a unified distillation framework to use "side" information, including a small clean dataset and label relations in knowledge graph, to "hedge the risk" of learning from noisy labels. Unlike the traditional approaches evaluated based on simulated label noises, we propose a suite of new benchmark datasets, in Sports, Species and Artifacts domains, to evaluate the task of learning from noisy labels in the practical setting. The empirical study demonstrates the effectiveness of our proposed method in all the domains.
Rationale: Up to date, the exploration of clinical features in severe COVID-19 patients were mostly from the same center in Wuhan, China. The clinical data in other centers is limited. This study aims to explore the feasible parameters which could be used in clinical practice to predict the prognosis in hospitalized patients with severe coronavirus disease-19 . Methods: In this case-control study, patients with severe COVID-19 in this newly established isolation center on admission between 27 January 2020 to 19 March 2020 were divided to discharge group and death event group. Clinical information was collected and analyzed for the following objectives: 1. Comparisons of basic characteristics between two groups; 2. Risk factors for death on admission using logistic regression; 3. Dynamic changes of radiographic and laboratory parameters between two groups in the course. Results: 124 patients with severe COVID-19 on admission were included and divided into discharge group (n=35) and death event group (n=89). Sex, SpO2, breath rate, diastolic pressure, neutrophil, lymphocyte, C-reactive protein (CRP), procalcitonin (PCT), lactate dehydrogenase (LDH), and D-dimer were significantly correlated with death events identified using bivariate logistic regression. Further multivariate logistic regression demonstrated a significant model fitting with C-index of 0.845 (p<0.001), in which SpO2≤89%, lymphocyte≤0.64×10 9 /L, CRP>77.35mg/L, PCT>0.20μg/L, and LDH>481U/L were the independent risk factors with the ORs of 2. 959, 4.015, 2.852, 3.554, and 3.185, respectively (p<0.04). In the course, persistently lower lymphocyte with higher levels of CRP, PCT, IL-6, neutrophil, LDH, D-dimer, cardiac troponin I (cTnI), brain natriuretic peptide (BNP), and increased CD4+/CD8+ T-lymphocyte ratio and were observed in death events group, while these parameters stayed stable or improved in discharge group. Conclusions: On admission, the levels of SpO2, lymphocyte, CRP, PCT, and LDH could predict the prognosis of severe COVID-19 patients. Systematic inflammation with induced cardiac dysfunction was likely a primary reason for death events in severe COVID-19 except for acute respiratory distress syndrome.
Composing fashion outfits involves deep understanding of fashion standards while incorporating creativity for choosing multiple fashion items (e.g., Jewelry, Bag, Pants, Dress). In fashion websites, popular or high-quality fashion outfits are usually designed by fashion experts and followed by large audiences. In this paper, we propose a machine learning system to compose fashion outfits automatically. The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data. We propose to leverage outfit popularity on fashion oriented websites to supervise the scoring component. The scoring component is a multi-modal multi-instance deep learning system that evaluates instance aesthetics and set compatibility simultaneously. In order to train and evaluate the proposed composition system, we have collected a large scale fashion outfit dataset with 195K outfits and 368K fashion items from Polyvore. Although the fashion outfit scoring and composition is rather challenging, we have achieved an AUC of 85% for the scoring component, and an accuracy of 77% for a constrained composition task.
With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich metadata. To advance research on animated GIF understanding, we collected a new dataset, Tumblr GIF (TGIF), with 100K animated GIFs from Tumblr and 120K natural language descriptions obtained via crowdsourcing. The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips. To ensure a high quality dataset, we developed a series of novel quality controls to validate free-form text input from crowdworkers. We show that there is unambiguous association between visual content and natural language descriptions in our dataset, making it an ideal benchmark for the visual content captioning task. We perform extensive statistical analyses to compare our dataset to existing image and video description datasets. Next, we provide baseline results on the animated GIF description task, using three representative techniques: nearest neighbor, statistical machine translation, and recurrent neural networks. Finally, we show that models fine-tuned from our animated GIF description dataset can be helpful for automatic movie description.
Learning to rank has recently emerged as an attractive technique to train deep convolutional neural networks for various computer vision tasks. Pairwise ranking, in particular, has been successful in multi-label image classification, achieving state-of-the-art results on various benchmarks. However, most existing approaches use the hinge loss to train their models, which is non-smooth and thus is difficult to optimize especially with deep networks. Furthermore, they employ simple heuristics, such as top-k or thresholding, to determine which labels to include in the output from a ranked list of labels, which limits their use in the real-world setting. In this work, we propose two techniques to improve pairwise ranking based multi-label image classification: (1) we propose a novel loss function for pairwise ranking, which is smooth everywhere and thus is easier to optimize; and (2) we incorporate a label decision module into the model, estimating the optimal confidence thresholds for each visual concept. We provide theoretical analyses of our loss function in the Bayes consistency and risk minimization framework, and show its benefit over existing pairwise ranking formulations. We demonstrate the effectiveness of our approach on three large-scale datasets, VOC2007, NUS-WIDE and MS-COCO, achieving the best reported results in the literature.
Action recognition with 3D skeleton sequences is becoming popular due to its speed and robustness. The recently proposed Convolutional Neural Networks (CNN) based methods have shown good performance in learning spatio-temporal representations for skeleton sequences. Despite the good recognition accuracy achieved by previous CNN based methods, there exist two problems that potentially limit the performance. First, previous skeleton representations are generated by chaining joints with a fixed order. The corresponding semantic meaning is unclear and the structural information among the joints is lost. Second, previous models do not have an ability to focus on informative joints. The attention mechanism is important for skeleton based action recognition because there exist spatio-temporal key stages while the joint predictions can be inaccurate. To solve these two problems, we propose a novel CNN based method for skeleton based action recognition. We first redesign the skeleton representations with a depth-first tree traversal order, which enhances the semantic meaning of skeleton images and better preserves the associated structural information. We then propose the idea of a two-branch attention architecture that focuses on spatio-temporal key stages and filters out unreliable joint predictions. A base attention model with the simplest structure is first introduced to illustrate the two-branch attention architecture. By improving the structures in both branches, we further propose a Global Longsequence Attention Network (GLAN). Furthermore, in order to adjust the kernel's spatio-temporal aspect ratios and better capture long term dependencies, we propose a Sub-Sequence Attention Network (SSAN) that takes sub-image sequences as inputs. We show that the two-branch attention architecture can be combined with the SSAN to further improve the performance. Our experiment results on the NTU RGB+D dataset and the SBU Kinetic Interaction dataset outperforms the state-of-the-art. The model is further validated on noisy estimated poses from the UCF101 dataset and the Kinetics dataset.
Proliferating cancer cells preferentially use anaerobic glycolysis rather than oxidative phosphorylation for energy production. Hexokinase 2 (HK2) is highly expressed in many malignant cells and is necessary for anaerobic glycolysis. The role of HK2 in laryngeal squamous cell carcinoma (LSCC) is unknown. In this study, the expression of HK2 in LSCC was investigated and the effect of inhibiting HK2 expression with small hairpin RNA (shRNA) on tumor growth was investigated. Using immunohistochemistry, HK2 expression was assessed in LSCC tissues. Human laryngeal carcinoma Hep-2 cells were stably transfected with a plasmid expressing HK2 shRNA (pGenesil-1.1-HK2) and were compared to control cells with respect to the cell cycle, cell viability, apoptosis, and their ability to form xenograft tumors. HK2 expression was significantly higher in LSCC than in papilloma or glottis polypus. Tumor samples of higher T, N, and TNM stage often had stronger HK2 staining. HK2 shRNA reduced HK2 mRNA, protein levels, and HK activity in Hep-2 cells. HK2 cells expressing shRNA demonstrated a higher G0-G1 ratio, increased apoptosis, and reduced viability. Xenograft tumors derived from cells expressing HK2 shRNA were smaller and had lower proliferation than those from untransfected or control-plasmid-transfected cells. In conclusion, depletion of HK2 expression resulted in reduced xenograft tumor development likely by reducing proliferation, altering the cell cycle, reducing cell viability and activating apoptosis. These data suggest that HK2 plays an important role in the development of LSCC and represents a potential therapeutic target for LSCC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.