Essentially, multiple-armed RCTs can be appraised using the checklist for the standard two-armed trial. However, some additional issues should be considered:

Does the study present an analysis of the differences between each pair of arms, or does it present an overall analysis of the difference between all the groups (for example an ANOVA test)?

An "among-group" statistical assessment can be difficult to interpret, especially if only one comparison is of interest and results cannot be attributed to the group of interest.

The way data is combined can also affect the results, so be watchful for selective data combining.

Why is the RCT looking at more than two arms?

Are the different arms examining related clinical question(s)? Example 1: Two different doses of treatment A versus control B, where the related questions are: is treatment A effective and at which dose? Example 2: Treatment A versus previous gold-standard treatment B versus inactive control C, where the question is should treatment A be used as the new first-line treatment?

Alternatively, are the different arms looking at separate questions and examined in one trial for efficiency/logistical reasons? For example, new treatment A versus new treatment B versus standard control C, where the separate questions are: is new treatment A better than standard treatment C and is new treatment B better than standard treatment C?

Does the RCT apply any multiplicity correction factor?

It has been suggested that increasing the number of analyses on a particular data set can in certain cases increase the chances of getting a type I error (i.e., identifying a result as significant when it isn't i.e., it is due to chance). For example, if a particular outcome is of interest in a two-armed trial of treatments A versus B, then there is only one comparison of means (A v B); however, in a three-armed trial of treatments A, B and C, there are three different two-way comparisons of means (A v B; A v C; B v C) on the data set. As the number of arms increases so does the number of comparisons — e.g., in a 4-armed trial of A, B, C, D, there are 6 different two-way comparisons of means (A v B; A v C; A v D; B v C; B v D; C v D). To compensate for this, some studies employ a Bonferroni or similar correction factor. However, there has been some debate about whether an adjustment is required depending on the study design and if so what this should be. There is concern, for example, that applying a Bonferroni or similar correction can increase the likelihood of a type II error (i.e., rejecting a true significant result where one exists).

Whatever approach the study takes, it should clearly describe what comparisons and statistical tests it examined and the basis for these. It should also comment on the possible interpretations of the result. Critical appraisal is a structured approach to assessing the validity of this analysis and whether the interpretation of the results is reliable and useful.

Read more