Non-randomised studies of the effects of interventions are critical to many areas of healthcare evaluation, but their results may be biased. It is therefore important to understand and appraise their strengths and weaknesses. We developed ROBINS-I (“Risk Of Bias In Non-randomised Studies - of Interventions”), a new tool for evaluating risk of bias in estimates of the comparative effectiveness (harm or benefit) of interventions from studies that did not use randomisation to allocate units (individuals or clusters of individuals) to comparison groups. The tool will be particularly useful to those undertaking systematic reviews that include non-randomised studies.
The number of published systematic reviews of studies of healthcare interventions has increased rapidly and these are used extensively for clinical and policy decisions. Systematic reviews are subject to a range of biases and increasingly include non-randomised studies of interventions. It is important that users can distinguish high quality reviews. Many instruments have been designed to evaluate different aspects of reviews, but there are few comprehensive critical appraisal instruments. AMSTAR was developed to evaluate systematic reviews of randomised trials. In this paper, we report on the updating of AMSTAR and its adaptation to enable more detailed assessment of systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. With moves to base more decisions on real world observational evidence we believe that AMSTAR 2 will assist decision makers in the identification of high quality systematic reviews, including those based on non-randomised studies of healthcare interventions.
Users of clinical practice guidelines and other recommendations need to know how much confidence they can place in the recommendations. Systematic and explicit methods of making judgments can reduce errors and improve communication. We have developed a system for grading the quality of evidence and the strength of recommendations that can be applied across a wide range of interventions and contexts. In this article we present a summary of our approach from the perspective of a guideline user. Judgments about the strength of a recommendation require consideration of the balance between benefits and harms, the quality of the evidence, translation of the evidence into specific circumstances, and the certainty of the baseline risk. It is also important to consider costs (resource utilisation) before making a recommendation. Inconsistencies among systems for grading the quality of evidence and the strength of recommendations reduce their potential to facilitate critical appraisal and improve communication of these judgments. Our system for guiding these complex judgments balances the need for simplicity with the need for full and transparent consideration of all important issues.
AMSTAR has good agreement, reliability, construct validity, and feasibility. These findings need confirmation by a broader range of assessors and a more diverse range of reviews.
BackgroundA number of approaches have been used to grade levels of evidence and the strength of recommendations. The use of many different approaches detracts from one of the main reasons for having explicit approaches: to concisely characterise and communicate this information so that it can easily be understood and thereby help people make well-informed decisions. Our objective was to critically appraise six prominent systems for grading levels of evidence and the strength of recommendations as a basis for agreeing on characteristics of a common, sensible approach to grading levels of evidence and the strength of recommendations.MethodsSix prominent systems for grading levels of evidence and strength of recommendations were selected and someone familiar with each system prepared a description of each of these. Twelve assessors independently evaluated each system based on twelve criteria to assess the sensibility of the different approaches. Systems used by 51 organisations were compared with these six approaches.ResultsThere was poor agreement about the sensibility of the six systems. Only one of the systems was suitable for all four types of questions we considered (effectiveness, harm, diagnosis and prognosis). None of the systems was considered usable for all of the target groups we considered (professionals, patients and policy makers). The raters found low reproducibility of judgements made using all six systems. Systems used by 51 organisations that sponsor clinical practice guidelines included a number of minor variations of the six systems that we critically appraised.ConclusionsAll of the currently used approaches to grading levels of evidence and the strength of recommendations have important shortcomings.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.