GRADE (Grading of Recommendations, Assessment, Development and Evaluations) is a transparent framework for developing and presenting summaries of evidence and provides a systematic approach for making clinical practice recommendations.[1-3] It is the most widely adopted tool for grading the quality of evidence and for making recommendations with over 100 organizations worldwide officially endorsing GRADE.
How does it work?
First, the authors decide what the clinical question is, including the population that the question applies to, the two or more alternatives, and the outcomes that matter most to those faced with the decision. A study – ideally a systematic review – provides the best estimate of the effect size for each outcome, in absolute terms (e.g. a risk difference).
The authors then rate the quality of evidence, which is best applied to each outcome, because the quality of evidence often varies between outcomes. An overall GRADE quality rating can be applied to a body of evidence across outcomes, usually by taking the lowest quality of evidence from all of the outcomes that are critical to decision making.
GRADE has four levels of evidence – also known as certainty in evidence or quality of evidence: very low, low, moderate, and high (Table 1). Evidence from randomized controlled trials starts at high quality and, because of residual confounding, evidence that includes observational data starts at low quality. The certainty in the evidence is increased or decreased for several reasons, described in more detail below.
|Certainty||What it means|
|Very low||The true effect is probably markedly different from the estimated effect|
|Low||The true effect might be markedly different from the estimated effect|
|Moderate||The authors believe that the true effect is probably close to the estimated effect|
|High||The authors have a lot of confidence that the true effect is similar to the estimated effect|
GRADE is subjective
GRADE cannot be implemented mechanically – there is by necessity a considerable amount of subjectivity in each decision. Two persons evaluating the same body of evidence might reasonably come to different conclusions about its certainty. What GRADE does provide is a reproducible and transparent framework for grading certainty in evidence.
What makes evidence less certain?
For each of risk of bias, imprecision, inconsistency, indirectness, and publication bias, authors have the option of decreasing their level of certainty one or two levels (e.g., from high to moderate).
The GRADE Domains for rating down
1. Risk of bias
Bias occurs when the results of a study do not represent the truth because of inherent limitations in the design or conduct of a study. In practice, it is difficult to know to what degree potential biases influence the results and therefore certainty is lower in the estimated effect if the studies informing the estimated effect could be biased.
There are several tools available to rate the risk of bias in individual randomized trials and observational studies.[10, 11]
GRADE is used to rate the body of evidence at the outcome level rather than the study level. Authors must, therefore, make a judgment about whether the risk of bias in the individual studies is sufficiently large that their confidence in the estimated treatment effect is lower. Key considerations for risk of bias and a detailed description of the process for moving from the risk of bias at the study level to risk of bias for a body of evidence is described in detail in the GRADE guidelines series #4: Rating the quality of evidence – study limitations (risk of bias).
The GRADE approach to rating imprecision focuses on the 95% confidence interval around the best estimate of the absolute effect. Certainty is lower if the clinical decision is likely to be different if the true effect was at the upper versus the lower end of the confidence interval. Authors may also choose to rate down for imprecision if the effect estimate comes from only one or two small studies or if there were few events. A detailed description of imprecision is described in the GRADE guidelines series #6: Rating the quality of evidence – imprecision.
Certainty in a body of evidence is highest when there are several studies that show consistent effects. When considering whether or not certainty should be rated down for inconsistency, authors should inspect the similarity of point estimates and the overlap of their confidence intervals, as well as statistical criteria for heterogeneity (e.g., the I2 and chi-squared test). A full discussion of inconsistency is available in the GRADE guidelines series #7: rating the quality of evidence – inconsistency.
Evidence is most certain when studies directly compare the interventions of interest in the population of interest and report the outcome(s) critical for decision-making. Certainty can be rated down if the patients studied are different from those for whom the recommendation applies. Indirectness can also occur when the interventions studied are different than the real outcomes (for example, a study of a new surgical procedure in a highly specialized center only indirectly applies to centers with less experience). Indirectness also occurs when the outcome studied is a surrogate for a different outcome – typically one that is more important to patients. A full discussion of indirectness is available in the GRADE guidelines series #8: rating the quality of evidence – indirectness.
5. Publication bias
Publication bias is perhaps the most vexing of the GRADE domains because it requires making inferences about missing evidence. Several statistical and visual methods are helpful in detecting publication bias, despite having serious limitations. Publication bias is more common with observational data and when most of the published studies are funded by industry. A full discussion of publication bias is available in the GRADE guidelines series #5: rating the quality of evidence – publication bias.
What increases confidence in the evidence?
In rare circumstances, certainty in the evidence can be rated up (see table 2). First, when there is a very large magnitude of effect, we might be more certain that there is at least a small effect. Second, when there is a clear dose-response gradient. Third, when residual confounding is likely to decrease rather than increase the magnitude of effect (in – situations with an effect). A more complete discussion of reasons to rate up for confidence is available at in the GRADE guidelines series #9: Rating up the quality of evidence.
|Certainty can be rated down for:||Certainty can be rated up for:|
Moving from the quality of evidence to recommendations
In GRADE, recommendations can be strong or weak, in favor or against intervention. Strong recommendations suggest that all or almost all persons would choose that intervention. Weak recommendations imply that there is likely to be an important variation in the decision that informed persons are likely to make. The strength of recommendations are actionable: a weak recommendation indicates that engaging in a shared decision-making process is essential, while a strong recommendation suggests that it is not usually necessary to present both options.
Recommendations are more likely to be weak rather than strong when the certainty in evidence is low when there is a close balance between desirable and undesirable consequences, when there is substantial variation or uncertainty in patient values and preferences, and when interventions require considerable resources. A full discussion is available in the BMJ series on the GRADE Evidence to Decision framework[18, 19] and in the original series[2, 20].
Authors: Reed Siemieniuk and Gordon Guyatt
- Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schunemann HJ. What is “quality of evidence” and why is it important to clinicians? BMJ (Clinical research ed). 2008;336(7651):995-8.
- Guyatt GH, Oxman AD, Vist GE, Kunz R, Falck-Ytter Y, Alonso-Coello P, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ (Clinical research ed). 2008;336(7650):924-6.
- Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. Journal of clinical epidemiology. 2011;64(4):383-94.
- Guyatt GH, Oxman AD, Kunz R, Atkins D, Brozek J, Vist G, et al. GRADE guidelines: 2. Framing the question and deciding on important outcomes. Journal of clinical epidemiology. 2011;64(4):395-400.
- Balshem H, Helfand M, Schunemann HJ, Oxman AD, Kunz R, Brozek J, et al. GRADE guidelines: 3. Rating the quality of evidence. Journal of clinical epidemiology. 2011;64(4):401-6.
- Guyatt G, Oxman AD, Sultan S, Brozek J, Glasziou P, Alonso-Coello P, et al. GRADE guidelines: 11. Making an overall rating of confidence in effect estimates for a single outcome and for all outcomes. Journal of clinical epidemiology. 2013;66(2):151-7.
- Mustafa RA, Santesso N, Brozek J, Akl EA, Walter SD, Norman G, et al. The GRADE approach is reproducible in assessing the quality of evidence of quantitative evidence syntheses. Journal of clinical epidemiology. 2013;66(7):736-42; quiz 42.e1-5.
- Guyatt GH, Oxman AD, Vist G, Kunz R, Brozek J, Alonso-Coello P, et al. GRADE guidelines: 4. Rating the quality of evidence–study limitations (risk of bias). Journal of clinical epidemiology. 2011;64(4):407-15.
- Higgins JP, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomized trials. BMJ (Clinical research ed). 2011;343:d5928.
- Wells G, Shea B, O’connell D, Peterson J, Welch V, Losos M, et al. The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomized studies in meta-analyses. Ottawa: Ottawa Hospital Research Institute; 2011. oxford. asp; 2011.
- Sterne JA, Hernan MA, Reeves BC, Savovic J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ (Clinical research ed). 2016;355:i4919.
- Guyatt GH, Oxman AD, Kunz R, Brozek J, Alonso-Coello P, Rind D, et al. GRADE guidelines 6. Rating the quality of evidence–imprecision. Journal of clinical epidemiology. 2011;64(12):1283-93.
- Walsh M, Srinathan SK, McAuley DF, Mrkobrada M, Levine O, Ribic C, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. Journal of clinical epidemiology. 2014;67(6):622-8.
- Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 7. Rating the quality of evidence–inconsistency. Journal of clinical epidemiology. 2011;64(12):1294-302.
- Guyatt GH, Oxman AD, Kunz R, Woodcock J, Brozek J, Helfand M, et al. GRADE guidelines: 8. Rating the quality of evidence–indirectness. Journal of clinical epidemiology. 2011;64(12):1303-10.
- Guyatt GH, Oxman AD, Montori V, Vist G, Kunz R, Brozek J, et al. GRADE guidelines: 5. Rating the quality of evidence–publication bias. Journal of clinical epidemiology. 2011;64(12):1277-82.
- Guyatt GH, Oxman AD, Sultan S, Glasziou P, Akl EA, Alonso-Coello P, et al. GRADE guidelines: 9. Rating up the quality of evidence. Journal of clinical epidemiology. 2011;64(12):1311-6.
- Alonso-Coello P, Schunemann HJ, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, et al. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 1: Introduction. BMJ (Clinical research ed). 2016;353:i2016.
- Alonso-Coello P, Oxman AD, Moberg J, Brignardello-Petersen R, Akl EA, Davoli M, et al. GRADE Evidence to Decision (EtD) frameworks: a systematic and transparent approach to making well informed healthcare choices. 2: Clinical practice guidelines. BMJ (Clinical research ed). 2016;353:i2089.
- Guyatt GH, Oxman AD, Kunz R, Falck-Ytter Y, Vist GE, Liberati A, et al. Going from evidence to recommendations. BMJ (Clinical research ed). 2008;336(7652):1049-51.