ObjectiveThis paper aims to address the challenges in citation screening (a.k.a. abstract screening) within Systematic Reviews (SR) by leveraging the zero-shot capabilities of large language models, particularly ChatGPT.MethodsWe employ ChatGPT as a zero-shot ranker to prioritize candidate studies by aligning abstracts with the selection criteria outlined in an SR protocol. Citation screening was transformed into a novel question-answering (QA) framework, treating each selection criterion as a question addressed by ChatGPT. The framework involves breaking down the selection criteria into multiple questions, properly prompting ChatGPT to answer each question, scoring and re-ranking each answer, and combining the responses to make nuanced inclusion or exclusion decisions.ResultsLarge-scale validation was performed on the benchmark of CLEF eHealth 2019 Task 2: Technology Assisted Reviews in Empirical Medicine. Across 31 datasets of four categories of SRs, the proposed QA framework consistently outperformed other zero-shot ranking models. Compared with complex ranking approaches with iterative relevance feedback and fine-tuned deep learning-based ranking models, our ChatGPT-based zero-shot citation screening approaches still demonstrated competitive and sometimes better results, underscoring their high potential in facilitating automated systematic reviews.ConclusionInvestigation justified the indispensable value of leveraging selection criteria to improve the performance of automated citation screening. ChatGPT demonstrated proficiency in prioritizing candidate studies for citation screening using the proposed QA framework. Significant performance improvements were obtained by re-ranking answers using the semantic alignment between abstracts and selection criteria. This further highlighted the pertinence of utilizing selection criteria to enhance citation screening.