We show that for human-object interaction detection a relatively simple factorized model with appearance and layout encodings constructed from pre-trained object detectors outperforms more sophisticated approaches. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (1) eliminating a train-inference mismatch; (2) rejecting easy negatives during mini-batch training; and (3) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset [4].
Our goal is to recover a complete 3D model from a depth image of an object. Existing approaches rely on user interaction or apply to a limited class of objects, such as chairs. We aim to fully automatically reconstruct a 3D model from any category. We take an exemplar-based approach: retrieve similar objects in a database of 3D models using view-based matching and transfer the symmetries and surfaces from retrieved models. We investigate completion of 3D models in three cases: novel view (model in database); novel model (models for other objects of the same category in database); and novel category (no models from the category in database).
Imagining a scene described in natural language with realistic layout and appearance of entities is the ultimate test of spatial, visual, and semantic world knowledge. Towards this goal, we present the Composition, Retrieval and Fusion Network (Craft), a model capable of learning this knowledge from video-caption data and applying it while generating videos from novel captions. Craft explicitly predicts a temporal-layout of mentioned entities (characters and objects), retrieves spatio-temporal entity segments from a video database and fuses them to generate scene videos. Our contributions include sequential training of components of Craft while jointly modeling layout and appearances, and losses that encourage learning compositional representations for retrieval. We evaluate Craft on semantic fidelity to caption, composition consistency, and visual quality. Craft outperforms direct pixel generation approaches and generalizes well to unseen captions and to unseen video databases with no text annotations. We demonstrate Craft on Flintstones, a new richly annotated video-caption dataset with over 25000 videos. For a glimpse of videos generated by Craft, see https://youtu.be/688Vv86n0z8.
Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions. A key idea is to construct effective negative captions for learning through language model guided word substitutions. Training with our negatives yields a ∼ 10% absolute gain in accuracy over randomlysampled negatives from the training data. Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of 5.7% to achieve 76.7% accuracy on Flickr30K Entities benchmark.
A special purpose learning system assumes knowledge of admissible tasks at design time. Adapting such a system to unforeseen tasks requires architecture manipulation such as adding an output head for each new task or dataset. In this work, we propose a task-agnostic vision-language system that accepts an image and a natural language task description and outputs bounding boxes, confidences, and text. The system supports a wide range of vision tasks such as classification, localization, question answering, captioning, and more. We evaluate the system's ability to learn multiple skills simultaneously, to perform tasks with novel skillconcept combinations, and to learn new skills efficiently and without forgetting.
PurposeIt would not be an exaggeration to say that healthcare is the most crucial one in today's perspective. The healthcare sector, in general, is engaged in working on various dimensions simultaneously like the safety, care, quality and cost of services, etc. Still, the desired outcomes from this sector are far away, and it becomes pertinent to address all such issues associated with healthcare on a priority basis for sustaining the outcomes in a long-term perspective. The present study aims to explore the healthcare sector and list out the directly associated enablers contributing to increasing the viability of the healthcare sector. Besides, the interrelationship among the enlisted enablers needs to be studied, which further helps in setting-out the priority to deal with individual enablers based on their impedance in the contribution towards viability increment.Design/methodology/approachThe authors have done an extensive review to list out the enablers of the healthcare sector to perform efficiently and effectively. Further, the attempt has been made on the enablers to rank them by using the modified Total Interpretative Structure Modelling (m-TISM) approach. The validation of the study reveals the importance of enablers based on their position in the hierarchical structure. Further, the MICMAC analysis on the identified enabler is performed to categorize the identified enablers in the different clusters based on their driving power and dependence.FindingsThe research tries to envisage the importance of the healthcare sector and its contribution towards national development. The outcomes of the m-TISM model in the present study reveal the noteworthy contribution of the organizational structure in managing the healthcare facilities and represented it as the perspective of future growth. The well-designed organizational structure in the healthcare industry helps in establishing better employee–employer cooperation, workforce coordination and inter-department cooperation.Research limitations/implicationsEvery research work has limitations. Likewise, the present research work also has limitations, i.e. input taken for developing the models are from very few experts that may not reflect the opinion of the whole sector.Practical implicationsThe healthcare sector is the growing sector in the present-day scenario, and it is essential to keep the quality of treatment in check along with the quantity. The present study has laid down the practical foundations for improvement in the healthcare sector viability. Besides, the study emphasized on accountability of the healthcare sector officials to go with the enablers having the strong driving power for effective utilization of all the resources. This would further help them in customer (patients) satisfaction.Originality/valueDespite an increase in demand for good quality healthcare facilities worldwide, the growth of this sector is bounded by the economic, demographic, cultural and environmental concerns, etc. The present study proposed a unique framework that provides a better understanding of the enablers. It would further help in playing a key role in increasing the viability of the healthcare sector. The hierarchy developed with the help of m-TISM and MICMAC analysis will help the viewers to recognize the important enablers based on their contribution to the viability improvement of the healthcare sector.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.