Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Wang, Xiao; Chen, Guangyao; Qian, Guangwu; Gao, Ping; Wei, Xiaoyong; Wang, Yaowei; Tian, Yanling; Gao, Wen

doi:10.48550/arxiv.2302.10035

Cited by 5 publications

(3 citation statements)

References 172 publications

(247 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The kinds of tasks to which GLLMMs can be applied include pre-training in content creation (such as text, audio tracks, images, or videos) that become part of a larger undertaking such as designing art, composing music, or telling a story. Another example would be language translation where GLLMMs can convert text from one language to another with greater accuracy because they have also been primed by gaining earlier access to cultural nuances and context [12]. An example where GLLMMs are applied in the field of medicine would be to analyze the results of medical imaging more accurately (such as X-ray, MRI scans, or ultrasound) by pretraining with access to additional databases that relate to the relevant histology and pathology implicated in the imaging results [13].…”

Section: A Brief Primer On the Concepts And Nomenclature Of Aimentioning

confidence: 99%

Artificial Intelligence and Healthcare Simulation: The Shifting Landscape of Medical Education

Hamilton

2024

Cureus

View full text Add to dashboard Cite

The impact of artificial intelligence (AI) will be felt not only in the arena of patient care and deliverable therapies but will also be uniquely disruptive in medical education and healthcare simulation (HCS), in particular. As HCS is intertwined with computer technology, it offers opportunities for rapid scalability with AI and, therefore, will be the most practical place to test new AI applications. This will ensure the acquisition of AI literacy for graduates from the country's various healthcare professional schools. Artificial intelligence has proven to be a useful adjunct in developing interprofessional education and team and leadership skills assessments. Outcome-driven medical simulation has been extensively used to train students in imagecentric disciplines such as radiology, ultrasound, echocardiography, and pathology. Allowing students and trainees in healthcare to first apply diagnostic decision support systems (DDSS) under simulated conditions leads to improved diagnostic accuracy, enhanced communication with patients, safer triage decisions, and improved outcomes from rapid response teams. However, the issue of bias, hallucinations, and the uncertainty of emergent properties may undermine the faith of healthcare professionals as they see AI systems deployed in the clinical setting and participating in diagnostic judgments. Also, the demands of ensuring AI literacy in our healthcare professional curricula will place burdens on simulation assets and faculty to adapt to a rapidly changing technological landscape. Nevertheless, the introduction of AI will place increased emphasis on virtual reality platforms, thereby improving the availability of self-directed learning and making it available 24/7, along with uniquely personalized evaluations and customized coaching. Yet, caution must be exercised concerning AI, especially as society's earlier, delayed, and muted responses to the inherent dangers of social media raise serious questions about whether the American government and its citizenry can anticipate the security and privacy guardrails that need to be in place to protect our healthcare practitioners, medical students, and patients.

show abstract

Section: A Brief Primer On the Concepts And Nomenclature Of Aimentioning

confidence: 99%

Artificial Intelligence and Healthcare Simulation: The Shifting Landscape of Medical Education

Hamilton

2024

Cureus

View full text Add to dashboard Cite

show abstract

“…• Transferable debiasing techniques is developing debiasing techniques that are designed to be transferable across different models, datasets, or domains. These techniques may incorporate generalization principles, domain-independent features, or model-agnostic approaches that enable their application to diverse settings [176], [177].…”

Section: Lack Of Transferabilitymentioning

confidence: 99%

Sentiment analysis and opinion mining on educational data: A survey

Shaik¹,

Tao²,

Dann³

et al. 2023

Natural Language Processing Journal

View full text Add to dashboard Cite

“…Du et al [62] and Chen et al [63] reviewed VLM pre-training for visionlanguage tasks [57], [58], [60]. Xu et al [64] and Wang et al [65] shared recent progress of multi-modal learning on multi-modal tasks (e.g., language, vision and auditory modalities). Differently, we review VLMs for visual recognition tasks from three major aspects: 1) Recent progress of VLM pre-training for visual recognition tasks; 2) Two typical transfer approaches from VLMs to visual recognition tasks, i.e., transfer learning approach and knowledge distillation approach; 3) Benchmarking of state-of-the-art VLM pretraining methods on visual recognition tasks.…”

Section: Relevant Surveysmentioning

confidence: 99%

Causal Attention for Vision-Language Tasks

Zhang

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

100

View full text Add to dashboard Cite

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks (DNNs) training, and they usually train a DNN for each single visual recognition task, leading to a laborious and time-consuming visual recognition paradigm. To address the two challenges, Vision-Language Models (VLMs) have been intensively investigated recently, which learns rich vision-language correlation from web-scale image-text pairs that are almost infinitely available on the Internet and enables zero-shot predictions on various visual recognition tasks with a single VLM. This paper provides a systematic review of visual language models for various visual recognition tasks, including: (1) the background that introduces the development of visual recognition paradigms; (2) the foundations of VLM that summarize the widely-adopted network architectures, pre-training objectives, and downstream tasks; (3) the widely-adopted datasets in VLM pre-training and evaluations; (4) the review and categorization of existing VLM pre-training methods, VLM transfer learning methods, and VLM knowledge distillation methods; (5) the benchmarking, analysis and discussion of the reviewed methods; ( 6) several research challenges and potential research directions that could be pursued in the future VLM studies for visual recognition. A project associated with this survey has been created at https://github.com/jingyi0000/VLM survey.

show abstract

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Cited by 5 publications

References 172 publications

Artificial Intelligence and Healthcare Simulation: The Shifting Landscape of Medical Education

Artificial Intelligence and Healthcare Simulation: The Shifting Landscape of Medical Education

Sentiment analysis and opinion mining on educational data: A survey

Causal Attention for Vision-Language Tasks

Contact Info

Product

Resources

About