MsPBRsP: Multi-scale Protein Binding Residues Prediction Using Language Model

Li, Yuguang; Nan, Xiaofei; Zhang, Shoutao; Zhou, Qinglei

doi:10.1101/2023.02.26.528265

Cited by 2 publications

(5 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…All three annotators have sufficient medical knowledge. In implementations, we follow previous works (Li et al, 2023b; Zhang et al, 2023b) to randomly select 200 real patient-doctor conversations from Li et al (2023b). We require the LLMs to simulate a doctor and provide responses based on various patient inquiries.…”

Section: Resultsmentioning

confidence: 99%

“…is an open-ended complex task and requires the models to first understand the real-world patient-clinician conversations, in which the conversation describes the conditions and symptoms, and then recommend all possible drugs for the treatment of patients. We use Chat-Doctor (Li et al, 2023b) for evaluation.…”

Section: Benchmarkmentioning

confidence: 99%

“…In detail, as shown in Table 3, ChatGLM-Med (Wang et al, 2023b) and DoctorGLM (Xiong et al, 2023) are fine-tuned on the ChatGLM-6B (Tsinghua KEG, 2023; Du et al, 2022; Zeng et al, 2022) using QA pairs and dialogues, respectively. Huatuo (Zhang et al, 2023a), ChatDoctor (Li et al, 2023b), Baize-Healthcare (Xu et al, 2023), and MedAlpaca-7B/13B (Han et al, 2023) are built upon the LLaMA-series models. During fine-tuning, both Huatuo and MedAlpaca employ the QA pairs collected from the medical knowledge graphs and medical texts, respectively.…”

Section: Large Language Modelsmentioning

confidence: 99%

“…Large language models (LLMs), such as ChatGPT (Brown et al, 2020; OpenAI, 2023b), LLaMA (Touvron et al, 2023a), and PaLM (Chowdhery et al, 2022), are increasingly being recognized for their potential in healthcare to aid clinical decision-making and provide innovative solutions for complex healthcare problems (Patel et al, 2023; Shen et al, 2023), e.g., discharge summary generation (Patel and Lam, 2023), health education (Safranek et al, 2023), and care planning (Fleming et al, 2023). Several recent efforts have been made to fine-tune publicly available general LLMs, e.g., LLaMA (Touvron et al, 2023b) and ChatGLM (Tsinghua KEG, 2023), to develop medical LLMs (Singhal et al, 2023a,c), resulting in ChatDoctor (Li et al, 2023b), MedAlpaca (Han et al, 2023), BenTsao (Wang et al, 2023a), and ClinicalCamel (Toma et al, 2023). Previous research shows that medical LLMs outperform human experts across a variety of medical tasks.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Large Language Models in the Clinic: A Comprehensive Benchmark

Liu,

Zhou,

Hua

et al. 2024

Preprint

View full text Add to dashboard Cite

The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering task with answer options for evaluation. However, in real clinical settings, many clinical decisions, such as treatment recommendations, involve answering open-ended questions without pre-set options. Meanwhile, existing studies mainly use accuracy to assess model performance. In this paper, we comprehensively benchmark diverse LLMs in healthcare, to clearly understand their strengths and weaknesses. Our benchmark containsseventasks andthirteendatasets across medical language generation, understanding, and reasoning. We conduct a detailed evaluation of existingsixteenLLMs in healthcare under both zero-shot and few-shot (i.e., 1,3,5-shot) learning settings. We report the results onfivemetrics (i.e. matching, faithfulness, comprehensiveness, generalizability, and robustness) that are critical in achieving trust from clinical users. We further invite medical experts to conduct human evaluation.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Benchmarkmentioning

confidence: 99%

Section: Large Language Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Large Language Models in the Clinic: A Comprehensive Benchmark

Liu,

Zhou,

Hua

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…A notable advancement in this domain has been the widespread adoption of learning-based embedding models, which leverage high-dimensional vector representations to enable effective and efficient analysis and search of unstructured data [37,61]. High-dimensional Vector Similarity Search (HVSS) is a critical challenge in many domains, such as databases [25,68], information retrieval [28,32], recommendation systems [19,54], scientific computing [51,78], and large language models (LLMs) [7,12,44]. The computational complexity associated with exact query answering in HVSS has spurred recent research efforts toward developing approximate search methods [25,49,68].…”

Section: Introductionmentioning

confidence: 99%

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Wang,

Xu,

et al. 2024

Proc. ACM Manag. Data

View full text Add to dashboard Cite

High-dimensional vector similarity search (HVSS) is gaining prominence as a powerful tool for various data science and AI applications. As vector data scales up, in-memory indexes pose a significant challenge due to the substantial increase in main memory requirements. A potential solution involves leveraging disk-based implementation, which stores and searches vector data on high-performance devices like NVMe SSDs. However, implementing HVSS for data segments proves to be intricate in vector databases where a single machine comprises multiple segments for system scalability. In this context, each segment operates with limited memory and disk space, necessitating a delicate balance between accuracy, efficiency, and space cost. Existing disk-based methods fall short as they do not holistically address all these requirements simultaneously. In this paper, we present Starling, an I/O-efficient disk-resident graph index framework that optimizes data layout and search strategy within the segment. It has two primary components: (1) a data layout incorporating an in-memory navigation graph and a reordered disk-based graph with enhanced locality, reducing the search path length and minimizing disk bandwidth wastage; and (2) a block search strategy designed to minimize costly disk I/O operations during vector query execution. Through extensive experiments, we validate the effectiveness, efficiency, and scalability of Starling. On a data segment with 2GB memory and 10GB disk capacity, Starling can accommodate up to 33 million vectors in 128 dimensions, offering HVSS with over 0.9 average precision and top-10 recall rate, and latency under 1 millisecond. The results showcase Starling's superior performance, exhibiting 43.9x higher throughput with 98% lower query latency compared to state-of-the-art methods while maintaining the same level of accuracy.

show abstract

MsPBRsP: Multi-scale Protein Binding Residues Prediction Using Language Model

Cited by 2 publications

References 65 publications

Large Language Models in the Clinic: A Comprehensive Benchmark

Large Language Models in the Clinic: A Comprehensive Benchmark

Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment

Contact Info

Product

Resources

About