Thomas Huang scite author profile

Background Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods We used 2 sets of multiple-choice questions to evaluate ChatGPT’s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT’s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results Of the 4 data sets, AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2, ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased (P=.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT’s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 (P<.001) and NBME-Free-Step2 (P=.001) data sets, respectively. Conclusions ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT’s capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning.

show abstract

How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Gilson

Safranek

Huang

et al. 2022

Preprint

133

View full text Add to dashboard Cite

Background: ChatGPT is a 175 billion parameter natural language processing model which can generate conversation style responses to user input. Objective: To evaluate the performance of ChatGPT on questions within the scope of United States Medical Licensing Examination (USMLE) Step 1 and Step 2 exams, as well as analyze responses for user interpretability. Methods: We used two novel sets of multiple choice questions to evaluate ChatGPT's performance, each with questions pertaining to Step 1 and Step 2. The first was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the userbase. The second, was the National Board of Medical Examiners (NBME) Free 120-question exams. After prompting ChatGPT with each question, ChatGPT's selected answer was recorded, and the text output evaluated across three qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results: On the four datasets, AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free- Step2, ChatGPT achieved accuracies of 44%, 42%, 64.4%, and 57.8%. The model demonstrated a significant decrease in performance as question difficulty increased (P=.012) within the AMBOSS- Step1 dataset. We found logical justification for ChatGPT's answer selection was present in 100% of outputs. Internal information to the question was present in >90% of all questions. The presence of information external to the question was respectively 54.5% and 27% lower for incorrect relative to correct answers on the NBME-Free-Step1 and NBME-Free-Step2 datasets (P<=.001). Conclusion: ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at greater than 60% threshold on the NBME-Free- Step-1 dataset we show that the model is comparable to a third year medical student. Additionally, due to the dialogic nature of the response to questions, we demonstrate ChatGPT's ability to provide reasoning and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as a medical education tool.

show abstract

Using next‐generation sequencing to redefineBRCAnessin triple‐negative breast cancer

et al. 2020

View full text Add to dashboard Cite

This is an open access article under the terms of the Creat ive Commo ns Attri bution-NonCo mmercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes. AbstractBRCAness is considered a predictive biomarker to platinum and poly(ADP-ribose) polymerase (PARP) inhibitors. However, recent trials showed that its predictive value was limited in triple-negative breast cancer (TNBC) treated with platinum. Moreover, tumors with mutations of DNA damage response (DDR) genes, such as homologous recombination (HR) genes, could be sensitive to platinum and PARP inhibitors. Thus, we aim to explore the relationship between mutation status of DDR genes and BRCAness in TNBC. We sequenced 56 DDR genes in 120 TNBC and identified BRCAness by array comparative genomic hybridization. The sequencing results showed that 13, 14, and 14 patients had BRCA, non-BRCA HR, and non-HR DDR gene mutations, respectively.Array comparative genomic hybridization revealed that BRCA-mutated and HR genemutated TNBC shared similar BRCAness features, both having higher numbers and longer length of large-scale structural aberration (LSA, >10 Mb) and similar altered chromosomal regions of LSA. These suggested non-BRCA HR gene-mutated TNBC shared similar characteristics with BRCA-mutated TNBC, indicating non-BRCA HR gene-mutated TNBC sensitive to platinum and PARP inhibitors. Among tumors with mutation of non-HR DDR genes, 3 PTEN and 1 MSH6 mutation also contained significant LSAs (BRCAness); however, they had different regions of genomic alteration to BRCA and HR gene-mutated tumors, might explain prior findings that PTEN-and MSH6-mutated cancer cells not sensitive to PARP inhibitors. Therefore, we hypothesize that the heterogeneous genomic background of BRCAness indicates different responsiveness to platinum and PARP inhibitors. Direct sequencing DDR genes in TNBC should be applied to predict their sensitivity toward platinum and PARP inhibitors.

show abstract

Circulating Tumor DNA as a Predictive Marker of Recurrence for Patients With Stage II-III Breast Cancer Treated With Neoadjuvant Therapy

Lin

Wang

et al. 2021

Front. Oncol.

View full text Add to dashboard Cite

BackgroundPatients with stage II to III breast cancer have a high recurrence rate. The early detection of recurrent breast cancer remains a major unmet need. Circulating tumor DNA (ctDNA) has been proven to be a marker of disease progression in metastatic breast cancer. We aimed to evaluate the prognostic value of ctDNA in the setting of neoadjuvant therapy (NAT).MethodsPlasma was sampled at the initial diagnosis (defined as before NAT) and after breast surgery and neoadjuvant therapy(defined as after NAT). We extracted ctDNA from the plasma and performed deep sequencing of a target gene panel. ctDNA positivity was marked by the detection of alterations, such as mutations and copy number variations.ResultsA total of 95 patients were enrolled in this study; 60 patients exhibited ctDNA positivity before NAT, and 31 patients exhibited ctDNA positivity after NAT. A pathologic complete response (pCR) was observed in 13 patients, including one ER(+)Her2(-) patient, six Her2(+) patients and six triple-negative breast cancer (TNBC) patients. Among the entire cohort, multivariate analysis showed that N3 classification and ctDNA positivity after NAT were independent risk factors that predicted recurrence (N3, hazard ratio (HR) 3.34, 95% confidence interval (CI) 1.26 – 8.87, p = 0.016; ctDNA, HR 4.29, 95% CI 2.06 – 8.92, p < 0.0001). The presence of ctDNA before NAT did not affect the rate of recurrence-free survival. For patients with Her2(+) or TNBC, patients who did not achieve pCR were associated with a trend of higher recurrence (p = 0.105). Advanced nodal status and ctDNA positivity after NAT were significant risk factors for recurrence (N2 – 3, HR 3.753, 95% CI 1.146 – 12.297, p = 0.029; ctDNA, HR 3.123, 95% CI 1.139 – 8.564, p = 0.027). Two patients who achieved pCR had ctDNA positivity after NAT; one TNBC patient had hepatic metastases six months after surgery, and one Her2(+) breast cancer patient had brain metastasis 13 months after surgery.ConclusionsThis study suggested that the presence of ctDNA after NAT is a robust marker for predicting relapse in stage II to III breast cancer patients.

show abstract

EML4–ALK rearrangement in squamous cell carcinoma shows significant response to anti-ALK inhibitor drugs crizotinib and alectinib

Huang

Engelmann

Morgan

et al. 2018

Cancer Chemother Pharmacol

View full text Add to dashboard Cite

EML4-ALK alterations are more common in adenocarcinomas and are rarely found in squamous cell histology. In documented cases, the majority of EML4-ALK translocations are identified in squamous cell histology and occur in patients with no or light smoking history. We report an EML4-ALK4 translocation in a 50-year-old patient with squamous cell carcinoma and an 18 pack-year smoking history. The patient had a near complete response in the CNS to alectinib treatment. Our observation suggests that EML4-ALK genomic testing may be clinically useful in patients with heavy smoking history.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Thomas Huang

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment

How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Using next‐generation sequencing to redefineBRCAnessin triple‐negative breast cancer

Circulating Tumor DNA as a Predictive Marker of Recurrence for Patients With Stage II-III Breast Cancer Treated With Neoadjuvant Therapy

EML4–ALK rearrangement in squamous cell carcinoma shows significant response to anti-ALK inhibitor drugs crizotinib and alectinib

Contact Info

Product

Resources

About