**Editorials**

# Confidence intervals in research evaluation

*ACP J Club*. 1992 Mar-April;116:A28. doi:10.7326/ACPJC-1992-116-2-A28

*Related Content in the Archives*

• Editorial: On the clinically important difference

• Letter: On the clinically important difference

• Editorial: Meta-statistics: help or hindrance?

No one who reads a medical journal can be unaware of the widespread use of statistics
in research papers. In particular, virtually all papers with statistical analyses
contain one or more *P* values. Most of us think we understand these, but studies have shown that *P* values are widely misinterpreted. The *P* value relates to the null hypothesis of no effect (for example, that 2 treatments
are equally effective). It is the probability of obtaining the observed data, or more
unlikely data, when the null hypothesis is true. In other words, the *P* value measures the compatibility of the data with the null hypothesis. The smaller
the *P* value, the less plausible is the null hypothesis and the more likely we are to reject
it and be convinced, for example, that 2 drugs differ in effectiveness. The *P* value does not indicate the magnitude of the effect of interest, or even its direction,
nor does it indicate how much uncertainty is associated with the results.

By contrast, the use of confidence intervals is founded on the idea that what we most wish to know is the magnitude of the effect of interest, together with some measure of uncertainty. The principle is to use the data from the sample studied to obtain a best estimate of the true effect in the whole relevant population (such as the difference in the effectiveness of 2 drugs for patients with a certain disease) and to give a range of uncertainty around that estimate. Confidence intervals are not a new concept, nor is the suggestion to use them in medical research (1, 2), but only recently have they begun to be used widely.

Because confidence intervals indicate the strength of evidence, they are of particular
relevance to *ACP Journal Club*. In small studies or in large studies where the outcome of interest is rare, confidence
intervals are wide, indicating imprecise estimation of the effect of interest. For
example, in a study to evaluate the ability of diabetologists to screen diabetic patients
for retinopathy (3), the serious error rate was given as 1 in 20 (5%). The 95% confidence interval for
the true error rate is 0.1% to 25%. When a comparative study has not found a statistically
significant effect (i.e., *P* > 0.05), a confidence interval is especially valuable. It will often indicate
that the interpretation of “not significant” as “no difference” cannot be supported
by the data because the results are compatible with large real effects. By comparison, *P* values alone allow a much restricted interpretation.

The contrast between *P* values and confidence intervals is well illustrated by 2 consecutive sentences in
a recent report in *ACP Journal Club* (4): “Overall hay fever symptom scores were lower for the Alutard SQ group (at peak
season, 2.2 vs 5.5; CI -4.8 to -0.5; *P* = 0.02). Postseasonal assessment by both patients and the study coordinator showed
improvement in favor of Alutard SQ (*P* < 0.001).” In the second sentence, the *P* value looks impressive, but neither the size of the difference in improvement nor
the uncertainty associated with the estimate of improvement is given.

The 2 values (limits) that define a confidence interval indicate the range of values of the true effect that is consistent with the data. A 95% confidence interval means that the data are not significantly different (at the 5% level) from any true effect between the limits of the interval. If many studies of the same problem are done, 95% of the 95% confidence intervals from all these studies will include the true value. Thus, a more common (although not absolutely correct) interpretation is that we can be 95% confident that the true value lies within the stated range of values.

The convention of using the value of 95% is arbitrary, just as is that of taking *P* < 0.05 as being significant, and authors sometimes use 90% or 99% confidence intervals.
There is a close relation between confidence intervals and *P* values. If the 95% confidence interval excludes the null value (usually 0, but 1 if
the estimate is an odds ratio or relative risk), then *P* < 0.05. (This relationship is not exact in some cases.) In general, it is recommended
that both confidence intervals and *P* values be presented, the latter as exact values (e.g., *P* = 0.13 or *P* = 0.005 rather than *P* < 0.05 or *P* < 0.01). However, the estimate and confidence interval are often sufficient. Confidence
intervals can be obtained in most circumstances, even for some nonparametric analyses
(5). A computer program is available to carry out all the common types of calculations
(6). When the authors have not provided confidence intervals, the intervals can often
be constructed using the results in the paper.

Confidence intervals are commonly used in meta-analyses. Results from many clinical trials or observational studies that appear contradictory are often shown to be compatible with some consistent true value when confidence intervals are constructed for each study. In addition, confidence intervals are routinely given in conjunction with the pooled estimate of effect.

For many comparative studies (including clinical trials and meta-analyses), the effect
of interest is the difference between two groups, so the confidence interval should
be for this difference (7). It is, however, common to see only within-group confidence intervals given. Indeed,
within-group confidence intervals were presented in several abstracts of clinical
trials reported in early issues of *ACP Journal Club*. For example, in a controlled trial of insulin-dependent diabetes (8), serious episodes of hypoglycemia occurred in 25 of 44 patients receiving intensified
conventional treatment (57%, 95% CI 44% to 73%) and in 12 of 53 patients receiving
regular treatment (23%, CI 11% to 34%, *P* < 0.001). The 95% confidence interval for the difference in proportions (of 34%)
can be calculated as 16% to 53%. Alternatively the data could be used to calculate
the relative risk (RR) of serious episodes of hypoglycemia, which is RR 2.51 (95%
CI 1.4 to 4.4). It is often, but not always, possible to construct the required between-group
confidence interval using information given in the paper.

In observational studies, confidence intervals are useful to give a range of uncertainty for estimates of prevalence, risk, and so forth. For example, the estimated risk for HIV-1 infection after percutaneous exposure to HIV-infected body fluids was estimated as 0.56% on the basis of a single occurrence after 179 exposures (9). The 95% confidence interval was naturally very wide (CI 0.01% to 3.06%), despite a sample size of over 2000.

Confidence intervals are valuable in assessing published papers. A statement such
as “there was an increased risk of breast cancer among cases (odds ratio, 3.1; 95%
CI 1.8 to 4.8),” is far more informative than “the risk of breast cancer was significantly
higher among cases than controls (*P* < 0.01).” Several journals now encourage or even require the use of confidence
intervals. Whenever possible, entries in *ACP Journal Club* will include them for at least the main outcomes.

**Douglas G. Altman**

## References

1. **Wulff HR.** Confidence limits in evaluating controlled trials. Lancet. 1973;2:969-70.

2. **Rothman K.** A show of confidence. N Engl J Med. 1978;299:1362-3.

3. Diabetologists made appropriate referrals for diabetic retinopathy. ACP J Club. 1991;114:86.

4. Immunotherapy with Alutard SQ reduced symptoms and need for medication in severe summer hay fever. ACP J Club. 1991;114:81.

5. **Gardner MJ, Altman DG.** Statistics with Confidence. London: British Medical Journal; 1989.

6. **Gardner MJ, Gardner SB, Winter PD.** Confidence Interval Analysis (CIA) Microcomputer Program Manual. London: British Medical
Journal; 1989.

7. **Haynes RB, Mulrow CD, Huth EJ, Altman DG, Gardner MJ.** More informative abstracts revisited. Ann Intern Med. 1990;113:69-76.

8. Intensified insulin increased the risk for serious hypoglycemia and neuroglycopenia in IDDM. ACP J Club. 1991;114:84.

9. The risk for HIV-1 infection after percutaneous exposure to HIV-infected body fluids was 0.56% per exposure. ACP J Club. 1991;114:57.