Current issues of ACP Journal Club are published in Annals of Internal Medicine


Editorials

Evidence-based screening: What kind of evidence is needed?

ACP J Club. 1998 May-June;128:A12. doi:10.7326/ACPJC-1998-128-3-A12



Over the past 18 months, expert groups have been busy issuing screening recommendations. The United States Preventive Services Task Force (USPSTF) (1), a National Institutes of Health (NIH) Consensus Development Conference on Breast Cancer Screening for Women Ages 40 to 49 (2), a National Cancer Institute (NCI) recommendation on the same topic (3), guidelines for colorectal cancer screening (4), and an updated guideline for prostate cancer screening (5) are some of the major screening recommendations that have recently been published in North America. These recommendations, which sometimes reach opposite conclusions for the same condition, challenge those who are interested in rational clinical policies for screening practices to reexamine what should be the appropriate evidence base for screening. The need for such reexamination is particularly urgent with the rapidly emerging ability to screen for genetic defects and the increasing public interest in screening policies.

Historically, Frame and Carlsen (6) 2 family practice residents, promoted evidence-based screening in a practice setting when, in 1975, they systematically considered several medical conditions that might be included in the routine physical examination. Their approach was adapted, expanded, and legitimized by the Canadian Task Force on the Periodic Health Examination when it issued its first report in 1979 (7). That approach, in turn, was adopted and amended by the USPSTF in its Guide to Clinical Preventive Services (1). These publications and the independent spread of clinical epidemiologic methods throughout clinical medicine have influenced an increasing number of specialty groups that make screening recommendations to become more explicit about rules of evidence.

Groups that follow such rules have emphasized the strength of the evidence on effectiveness of treatment after screening. Basically, they ask, “How good is the evidence that screening does more good than harm?” and more specifically, “How strong is the evidence that the outcome will improve if treatment is given after screening rather than at the time that the patient presents with symptoms?” The Canadian and U.S. task forces thought this question was so important that they developed elaborate grading systems to answer it. (The original Canadian Task Force report found evidence from randomized trials for only 21% of the approximately 90 conditions it considered. For most, the only “evidence” of effectiveness was expert opinion.) In the 1970s, many clinicians questioned whether rigorous evidence, such as that gleaned from randomized trials, was even necessary when considering what should be done during a periodic health examination. More recently, the conundrum clinicians and patients face with screening for prostate-specific antigen reminds us that strong evidence of a positive treatment effect remains important before screening tests are incorporated into medical practice (8, 9).

Early screening methodologists included 2 other questions to be considered before making screening recommendations: “How important is the health condition to be sought in terms of its frequency and mortality and morbidity?” and “How good is the screening test in terms of accuracy, safety, simplicity, acceptability (to patients and providers), labeling effects, and financial costs?” No grading system or systematic approach has been developed to answer these 2 questions.

At present, most recommendations presented by evidence-based groups have been driven by the quality of the evidence that compares treatment after screening with treatment without screening. Usually, the strength of evidence is determined by the research design of published studies and effectiveness of early treatment is determined by the risk for cause-specific mortality in the screened group compared with that of a control group.

Certain expert groups have begun to do more. It is time for everyone to do more. When making screening recommendations, we must systematically describe effects of screening programs in terms that are meaningful to patients, clinicians, the public, and policymakers. Specifically, we need to emphasize effect in absolute, not relative, terms. Effects of treatment beyond those on mortality should be described. Systematic approaches for assessing qualities of screening tests should be developed, and we need to pay more attention to downstream effects of false-positive results. Finally, we must incorporate cost-effectiveness analysis into all deliberations on screening. We must heed Kerr White's advice to determine the effectiveness of health care interventions in relation to their hazards and cost.

Reporting results of screening programs as relative risk reduction (RRR) becomes problematic as more and more screening recommendations are made for an increasing number of medical conditions. How does the 33% RRR of colorectal mortality after annual hemoccult testing compare with the 41% RRR of stroke after lowering diastolic blood pressure 5 to 6 mm Hg or with the 33% RRR associated with breast cancer screening? A reasonable layperson (whether patient or policymaker) may assume that the relative effect can be compared in all these cases and may be unaware of the large differences in absolute mortality reduction represented by the numbers. Emphasizing effects as absolute risk reduction (ARR) and incorporating the number needed to screen (NNS) alleviates confusion and automatically considers the frequency of the disease or condition. Perhaps such an emphasis would also help patients and policymakers better understand the real size of screening effects. For example, ACP Journal Club recently reviewed a randomized controlled trial of biennial fecal occult blood testing in which the RRR of colorectal deaths was 16% but the ARR was only 0.08% and the NNS was 1225 (95% CI 649 to 11 200) (10). Patients are often surprised how small the effects are when expressed in absolute terms. Using absolute numbers might also help to promote much-needed efforts to identify high-risk groups in whom a screening test could be concentrated and very low risk groups for whom it is not necessary. Yet most research reports—and guidelines—continue to emphasize relative, not absolute, risk reduction.

For effectiveness, expert groups concentrate on mortality, but patients often rate other outcomes as being as important. For example, women often ask me if breast cancer screening leads to less disfiguring surgery; they are surprised to learn that not 1 of the 8 randomized trials on breast cancer has addressed this question. For prostate cancer, impotence and incontinence after treatment may be as important to men as death; the evidence regarding these outcomes should be incorporated into guidelines on prostate cancer screening and shared with patients (9).

Currently, most groups that evaluate screening tests search for evidence on accuracy of a particular test by gathering reports about sensitivity and specificity. In clinical settings, the emphasis should shift to the percentage of false-negative and false-positive test results in various populations (e.g., according to age, race, or other risk factors). Because virtually all recommendations for screening in adults call for repeated testing, reporting false-positive and false-negative rates for a single screening test tells little about what to expect over a prolonged period of repeated testing (11). The cumulative rate of false-positive results over at least a decade is needed but almost never reported. Evidence should also be sought to determine the downstream consequences of false-positive results in terms of additional procedures done on the patient, other health care utilization (with financial costs), and emotional consequences to the patient (with possible hidden costs of health care utilization). The few studies of emotional reactions of patients told “your screening test result was not quite normal” are worrisome, especially when the results of these studies are translated to the population at large (12, 13).

Another methodologic problem is the introduction of new screening strategies for old diseases. The rules of evidence were written for the first screening test. How should groups considering screening handle a new test when good evidence exists for another (usually older) test for a condition? For example, do we need a 15-year randomized trial of sigmoidoscopy or colonoscopy now that 3 studies have found hemoccult testing to be effective? For that matter, do we need new trials for each “improved” hemoccult test developed? It is time to develop methodologically sound approaches to this problem (4, 14).

Cost-effectiveness is another important issue. Recently, the Panel on Cost-Effectiveness in Health and Medicine recommended standardization of the conduct and reporting of cost-effectiveness analyses (15). Ironically, most screening guidelines do not incorporate any type of cost-effectiveness analysis, let alone meet the standards of the Panel. (An exception is the guidelines for colorectal cancer screening [4].) Some groups, such as the NCI, specifically exclude issues of cost from their deliberations before issuing recommendations about screening.

The USPSTF gives summary statements (grades) for the level of evidence and the effectiveness of screening tests. In the future, I suggest expansion of the summary statement to include the ARR (with confidence intervals) achieved by screening, the amount of time it takes for the reduction to occur, the probability of false-positive screening test results during that period and the resulting adverse effects, and the cost-effectiveness of the program. For each statement, the strength of the evidence should be summarized, perhaps by expanding the grading systems developed by the Canadian and U.S. task forces. If such summaries were produced by expert groups that report on screening tests, interested parties could compare the effects of different tests across different conditions far more easily than they can today.

Expert groups almost always end their deliberations with a specific recommendation. But is this really an appropriate task for a scientific group? Perhaps it is time to leave the decision to screen or not to screen with those who must live with the consequences: patients and payers (16). Scientific groups should make the consequences as clear as possible and define the scientific principles and methods used to approach the evidence (17). policymakers (whether governments or health plans) and individual patients can then consider these reports and apply their own values to decide whether they want to pay a given amount of money for a given effect with the associated adverse consequences of treatment and false-positive results. If the final decision is left to patients and policymakers, it is reasonable to expect that different groups with different values and levels of resources will come to different conclusions when given the same set of scientific evidence about a screening procedure.

In contrast to the quiet deliberations of the Canadian Task Force in the 1970s, today's screening recommendations are all too likely to be governed by political, legal, and economic groups. These groups have discovered the public's growing interest in screening, along with the political power and large sums of money such interest engenders (16, 18, 19). As new possibilities for screening continue to arise but resources to pay for health care remain limited, those interested in evidence-based medicine must make evidence about screening more understandable to patients and policymakers. If they do not, I predict that politics and money, rather than evidence, values, and resources, will increasingly determine health policy for screening.

Suzanne W. Fletcher, MD
Harvard Medical SchoolHarvard Pilgrim Health Care
Boston, Massachusetts, USA


References

1. U.S. Preventive Services Task Force. Guide to Clinical Preventive Services. 2d ed. Baltimore: Williams & Wilkins; 1996.

2. National Institutes of Health Consensus Conference on Breast Cancer Screening for Women Ages 40-49. Proceedings of a conference held at the National Institutes of Health, Bethesda, Maryland, January 21-23, 1997. J Natl Cancer Inst. 1997;22:1-156.

3. PDQ screening and prevention information. Screening for breast cancer. (Available from the National Cancer Institute Cancer Information Service, 301-496-5583; or see http://cis.nci.nih.gov).

4. Winawer SJ, Fletcher RH, Miller L, et al. Colorectal cancer screening: clinical guidelines and rationale. Gastroenterology. 1997;112:594-642.

5. von Eschenbach A, Ho R, Murphy GP, Cunningham M, Lins N. American Cancer Society guideline for the early detection of prostate cancer: Update 1997. CA Cancer J Clin. 1997;47:261-4.

6. Frame PS, Carlson SJ. A critical review of periodic health screening using specific screening criteria. J Fam Pract. 1975; 2: 283-9.

7. Canadian Task Force on the Periodic Health Examination. The periodic health examination. Can Med Assoc J. 1979; 121: 1194-254.

8. Barry MJ. Is there an easier way to determine whether early detection of prostate cancer reduces mortality? J Gen Intern Med. 1997;12:657-8.

9. Biennial fecal occult-blood screening reduced colorectal cancer mortality [Abstract]. ACP J Club. 1997;126:63.

10. Woolf SH. Screening for prostate cancer with prostate-specific antigen: an examination of the evidence. N Engl J Med. 1995; 333:1401-5.

11. Russell LB. Educated Guesses. Making Policy About Medical Screening Tests. Berkeley: University of California Press; 1994.

12. Haynes RB, Sackett DL, Taylor DW, Gibson ES, Johnson AL. Increased absenteeism from work after detection and labelling of hypertensive patients. N Engl J Med. 1978;299:741-4.

13. Lerman C, Trock B, Rimer BK, et al. Psychological and behavioral implications of abnormal mammograms. Ann Intern Med. 1991;114:657-61.

14. Battista RN, Fletcher SW. Making recommendations on preventive practices: Methodological issues. Am J Prev Med. 1988; 4:53-66.

15. Siegel J, Weinstein MC, Russell LB, Gold MR. Recommendations for reporting cost-effectiveness analyses. JAMA. 1996; 276:1339-41.

16. Fletcher SW. Whither scientific deliberation in health policy recommendations? Alice in the Wonderland of breast cancer screening. N Engl J Med. 1997; 336:1180-3.

17. Eddy DM. Breast cancer screening in women younger than 50 years of age: what's next? Ann Intern Med. 1997;127:1035-6.

18. Ernster VL. Mammography screening for women aged 40-49: a guidelines saga and a clarion call for informed decision making. Am J Publ Health. 1997;87:1103-6.

19. Ransohoff DF, Harris RP. Lessons from the mammography screening controversy: can we improve the debate? Ann Intern Med. 1997;127:1029-34.