Radiologists differed in their interpretations of mammograms

ACP J Club. 1995 May-June;122:74. doi:10.7326/ACPJC-1995-122-3-074

Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists' interpretations of mammograms. N Engl J Med. 1994 Dec 1; 331:1493-9.



To determine the extent of variability in radiologists' interpretations of mammograms.


Blinded comparison of the assessment of 150 mammograms analyzed within pairs of 10 radiologists.


Yale-New Haven Hospital.


Random sampling of mammograms done in 1987, weighted to include more patients with mammographic abnormalities and breast cancer. Radiographs were excluded (121 out of 271) if the women previously had breast cancer, if the 1987 mammogram was not definitively interpreted, or for technical reasons, leaving 150 mammograms: 27 from women with breast cancer and 123 from women without breast cancer. The 10 board-certified radiologists who participated (7 private and 3 university) had a median of 7 years' experience (range 1.5 to 20 y) reading mammograms.

Description of test and diagnostic standard

Each radiologist independently read the 150 mammograms on 2 occasions, 5 months apart. 50 films shown with only the patient's age were used to assess intraobserver variability. At the first reading, half of the remaining films were shown with detailed clinical histories; but at the second reading, only patient age was shown. This sequence was reversed for the other half. Breast cancer was confirmed histopathologically. The absence of breast cancer required an absence of clinical and mammographic evidence after 3 years of follow-up.

Main outcome measures

Agreement among radiologists for film observations, diagnostic interpretation (4 categories from "normal" to "suggestive of cancer"), and recommendations for management (5 categories from "routine follow-up" to "biopsy").

Main results

The median weighted agreement for interobserver variability was 79% for diagnostic interpretations (range 71% to 82%) and 85% (range 65% to 91%) for the recommendation of a biopsy. The corresponding median kappa values were 0.47 (range 0.31 to 0.55) for diagnostic interpretation and 0.49 (range 0.20 to 0.69) for biopsy recommendations. The diagnostic interpretations of the 10 radiologists had a median sensitivity of 70% and a median specificity of 94%. The frequency of recommendations for an immediate workup ranged from 74% to 96% for mammograms from women with breast cancer and from 11% to 65% for films from the women without breast cancer. Substantial disagreements occurred in 2% of the radiologist pairs for diagnostic interpretation (normal vs suggestive of cancer) and in 3% of the radiologist pairs and 9% of the per-patient comparisons for recommendation of biopsy (routine follow-up vs biopsy).


Radiologists differed, sometimes substantially, in their interpretations of mammograms and in their recommendations for management.

Sources of funding: American Cancer Society and Robert Wood Johnson Clinical Scholars Program.

For article reprint: Dr. J.G. Elmore, University of Washington School of Medicine, 1959 North East Pacific St, Room BB527E, Seattle, WA 98195-6429, USA. FAX 206-616-5365.


This complex study of variability in mammographic interpretation brings to light, once again, the degree of uncertainty inherent in the results of many screening and diagnostic tests. The lack of a clinical history for two thirds of the study patients does not mirror clinical practice, nor does the relatively high proportion of cancers and benign abnormalities. This variability, however, has been documented for mammography (1, 2) and clearly persists despite technical improvements. In the study by Elmore and colleagues, radiologists who most often recommended biopsy for women with breast cancer also most often recommended it for women without cancer. Thus, sensitivity is increased at the cost of decreased specificity and higher rates of unnecessary biopsies.

Many primary care physicians may be shaken by the finding that radiologists in this study only showed about a 50% agreement beyond chance. Physicians do both themselves and their patients a disservice by not making the limitations of screening and diagnostic testing clear. Misguided belief in the infallibility of medical testing only leads to painfully shattered illusions.

Joy Melnikow, MD, MPH
University of California, DavisSacramento, California, USA


