“I can give you a six-word formula for success:
Think things through – then follow through.”
- Edward Vernon (Eddie) Rickenbacker

Choose the Evaluation Type and Design

As with your tobacco control program, one size evaluation does not fit all. When you set the goals for your program, you made them specific for different audiences in order to increase your likelihood of success. Evaluation requires the same kind of specificity. Based on what you want to get out of the evaluation, you will select different types of evaluations, and different evaluation designs.

The evaluation “type” defines what you would like to evaluate about the program, like the formation of one of the activities, or an outcome. The “design” is the methods you will use to gather the information that will answer your questions about things like the program’s formation and outcomes.

A choice of evaluation type and design is made based on these three criteria:

  • Your program goals
  • The purpose of your evaluation
  • The questions you want answered

Let’s take a closer look at evaluation types and designs. Click on each for more.

Evaluation Types

There are four basic types of evaluation: formative evaluation, process evaluation, short-term outcome evaluation, and long-term outcome evaluation. Here’s a brief overview of each:

  • Formative evaluation is the process of testing program plans, messages, materials, strategies, or modifications for weaknesses and strengths before they are put into effect. Formative evaluation is also used when an unanticipated problem occurs after the program is in effect.
  • Process evaluation is the mechanism for testing the delivery of the program, rather than its effect. For example, process evaluation would be used to determine whether the program’s procedures for reaching the target population are working as planned.
  • Short-term outcome evaluation is used to measure the immediate effect the program has on the intended audience soon after implementation.
  • Long-term outcome evaluation is used to measure the long-term effect of the program. It is the process of measuring whether your program met its ultimate goal of reducing tobacco-related disease and death. The results of long-term outcome evaluation often impact decisions to continue program funding.

Different types of evaluation may be more prominent at different points in the program. With a new program, for example, you might want to conduct a formative evaluation to help you design your activities so that they work. With an existing program, you might want to conduct an outcome evaluation to assess your program's effectiveness and to demonstrate that it is making productive use of resources. The table below gives you more details about the purpose and features of each type, when to use each, what each shows, and why each is most useful.

Type What it is When To Use What It Shows Why it is useful
Formative Evaluation Formative evaluation ensures that program materials, strategies, and activities are of the highest possible quality. When materials strategies and activities are being developed, the purpose of evaluation is to ensure that the program aspect being evaluated is feasible, appropriate, meaningful, acceptable, and culturally appropriate for the tobacco control program and the program’s target population. Begin formative evaluation as soon as the idea for a program is conceived.
 
•  During the development of a new program.
 
• When an existing program 1) is being modified,
2) has problems with no obvious solutions, or
3) is being used in a new setting, with a new population, or to target a new problem or behavior.
 
• Whenever an existing program is being adapted for use with a different target population or in a new location or setting
Formative evaluation shows:
 
•  Whether proposed messages are likely to reach, to be understood by, and to be accepted by the people you are trying to serve (e.g., shows strengths and weaknesses of proposed written materials).
 
•  How people in the target population get information (e.g., which newspapers they read or radio stations they listen to).
 
•  Whom the target population respects as a spokesperson (e.g., a sports celebrity or the local preacher).
 
•  Details that program developers may have overlooked about materials, strategies, or mechanisms for distributing information (e.g., that the target population has difficulty reaching the location where training classes are held).
Formative evaluation is useful because it:
 
•  Allows programs to make revisions before the full effort begins.
 
•  Maximizes the likelihood that the program will succeed
Type What it is When To Use What It Shows Why it is useful
Process Evaluation The purpose of process evaluation is to learn whether the program is being delivered as planned, e.g., whether it is serving the target population, and whether the number of people being served is more or less than expected. Process evaluation is linked to the activities and outputs in your logic model (see Setting the Stage). It is concerned with quality control in the delivery of activities, and counts of outputs. For example, it involves counting all contacts with the people you are trying to reach and counting all the events that resulted from those contacts. Begin process evaluation as soon as the program goes into operation. At this stage, you are not looking for results. You are merely trying to learn
 
•  whether you are delivering your program according to the plan or protocol,
 
•  whether you are connecting with the people in your target population, and
 
•  whether the people in the target population are connecting with you.
Process evaluation shows how well a program is being delivered. For example, are volunteer callers providing the same information? How many people are participating in the program? Are certain people or areas not participating? Process evaluation:
 
•  Ensures program consistency across various staff members or shifts, or over time.
 
•  Identifies early any problems that occur in reaching the target population.

•  Allows programs to evaluate how well their plans, procedures, activities, and materials are actually working and to make adjustments before logistical or administrative weaknesses become entrenched.
Type What it is When To Use What It Shows Why it is useful
Short-term outcome evaluation The purpose of short-term outcome evaluation is to measure whatever changes the program creates in the target population’s knowledge, attitudes, beliefs, or behaviors. It is linked with the short-term outcomes in your logic model (see Setting the Stage) Use short-term outcome evaluation after the program has made contact with at least one person or one group of people in the target population.
 
Collect baseline information for short-term outcome evaluation immediately before, or just as, the program goes into operation.
 
Gather information on changes brought about by the program as soon as program personnel have completed their first encounter with an individual or group from the target population.
Short-term outcome evaluation shows:
 
• The degree to which a program is meeting its intermediate goals. For example, how awareness of the hazards of environmental smoke has changed among participants.
 
• Changes in the target population’s knowledge, attitudes, and beliefs about environmental smoke.
Short-term outcome evaluation:
 
• Provides an indication of program effectiveness before sufficient time has elapsed to meet the ultimate goals.
 
• Allows management to modify materials or move resources from a nonproductive to a productive area of the program.
 
• Tells programs whether they are moving in the direction of achieving their goals.
Type What it is When To Use What It Shows Why it is useful
Long-term outcome evaluation The purpose of long-term outcome evaluation is to learn how well the program succeeded in achieving its ultimate goal. This type of evaluation is linked with the Impact(s) in your logic model (see Setting the Stage). One goal example might be to decrease smoking-related illness and death. As with many goals related to tobacco control, a goal like this is often difficult to measure since illness and death due to smoking have a long latency period, often years. Collect baseline information for long-term outcome (e.g., impact) evaluation immediately before, or just as, the program goes into operation.For ongoing programs, like a series of smoking and health classes taught each year to all third graders in your area, conduct follow-up portion of long-term outcome evaluation at specified intervals (e.g., every year, every 3 years, or every 5 years). For one-time programs, like a coalition to lobby for passage of a local ordinance against smoking in public places, conduct follow-up portion OF long-term outcome evaluation after the program is finished. Long-term outcome evaluation shows the degree to which the program has met its ultimate goals, like how much illness and death related to smoking were reduced by early smoking and health education. Long-term outcome evaluation:

•  Allows programs to learn from their successes and failures so that they, and others, can incorporate what they have learned into new projects.
 
•  Provides evidence of success for use in future requests for funding.

Top

Evaluation Designs

Tip: If you use an Experimental or Quasi-experimental design for your program, short- and long-term outcome evaluation will be a breeze because, in effect, you will be operating and evaluating the program at the same time.1

An evaluation design is the structure you set up that allows you to demonstrate your program’s performance. An evaluation design provides the framework for drawing conclusions. There are many designs used by evaluators, and they range from simple to complex, and subjective to objective. Evaluation designs generally fall into two main categories: Qualitative and Quantitative.

Qualitative Methods

Qualitative methods are used to probe the feelings, beliefs, and impressions of the people participating in the program. They allow the evaluator to identify issues that he or she never considered, and to judge the intensity of people’s preferences (e.g., for one brochure or another) without prejudicing participants with the evaluator’s own opinions. They do this by gathering information in a natural format.

Rather than relying upon structured questions, qualitative methods use open-ended questions so that respondents can answer according to their own beliefs. For example, rather than asking “Does the location of the program influence your attendance?,” a qualitative study would ask, “What are some things that will influence whether or not you attend the program?” Because qualitative methods are open-ended, they are especially valuable at the formative stage of evaluation, when programs are developing and pilot-testing proposed procedures, activities, and materials.

Qualitative methods are particularly useful for testing program elements before they are included in the program, or if a problem arises after they are in use. For example, suppose you decided to convene a meeting of representatives of organizations that are interested in smoke-free environment legislation to form a coalition. If only 3 people showed up out of 15 organizations, you might want to know why. To find out, you could use qualitative methods. You might call the organizations and interview a specified individual (e.g., the executive director) to determine why their representatives did not attend. Perhaps they did not receive the invitation. Perhaps the meeting location was inconvenient. Perhaps there was another important meeting related to smoke-free environments taking place at the same time.

Qualitative methods will allow you to identify the nature of the problem. Using these methods, evaluators can usually determine the cause of any problem because they ask those most directly involved. Once armed with knowledge about the cause, program staff can usually correct problems before major damage is done, so the results can be more effective the next time.

Qualitative methods can be used for formative evaluation, e.g., to ask participants to describe the information presented to them, to ensure that a standard protocol was followed. Qualitative methods can also be used to describe short-term outcomes such as participant satisfaction with the program or its activities. Presentations of this type of information in the direct words of the participants can be particularly appealing.

Some common ways to conduct qualitative evaluation include observations, interviews, and focus groups. For more about ways to conduct qualitative evaluations, visit The Power of Proof: Data Collection.

Quantitative Methods

Quantitative methods are ways of gathering objective data that can be expressed in numbers (e.g., a count of the people with whom a program had contact or the percentage reduction in a particular unhealthy behavior by the target population). Unlike the results produced by qualitative methods, under the right conditions results produced by quantitative methods can be used to draw conclusions about the entire target population. For example, suppose we conducted a school-based program designed to increase knowledge of the unhealthy effects of tobacco and skills for resisting peer pressure to use tobacco among minority youth. Suppose, too, that our evaluation found a 50% increase in their knowledge regarding the adverse effects of tobacco use. If the sample of minority youth included in the evaluation was representative of the target population of minority youth, then if the entire target population participated in the program we could assume their knowledge would increase by a value similar to 50% (within a “confidence interval”).

Quantitative methods are used during process and short- and long-term outcome evaluation. Occasionally, these methods are also used during formative evaluation to measure, for example, the level of participant satisfaction with the program.

There are four types of quantitative designs: descriptive, pre-experimental, experimental, and quasi-experimental2. Click on each for more information:

Descriptive Designs

Descriptive designs are typically used for formative or process evaluations. These designs include only people from the target population who are eligible to participate in some part of the program. As in a case study, the purpose is to describe some attribute of the target population members, e.g., their attitudes toward each of three public service announcements, their attendance at program sessions, or their willingness to meet with program personnel. Descriptive designs might also include some analytic comparisons within the target population. Such comparisons might be made among specific subsets of participants, for example, Do female participants prefer a different announcement than male participants? Do older participants attend more frequently than younger participants?

Back to four types of quantitative design

Pre-experimental Designs

Pre-experimental designs can document whether changes are present in the group that receives the program, but they do not have the power of proof. In order to prove a program’s effectiveness, the evaluation must show evidence that the program is the only possible explanation for the results that are achieved.

Let’s consider a program that was designed to educate state legislators and their staff about the benefits of clean indoor air. Program staff mailed literature to the legislators and their staff, and then visited them in person to answer questions. There are two pre-experimental designs that might commonly be used to assess the effect of such a program.

One pre-experimental design would be to measure the legislators’ and staff’s knowledge about the issue before the program performed their mailings and visits, and then to measure it again after the visits were completed. This is sometimes called a “pretest-posttest design”. The problem with this design is that, even if there is a dramatic increase in knowledge, it may not be the result of the program. While the program was going on, another program may have contacted the legislators, as well. Likewise, they may have read a newspaper article, found a website, or seen a television special about clean indoor air.

Another pre-experimental design would be to wait until after the mailings and visits took place, and then measure knowledge in legislators and staff who participated in the program and in legislators and staff who did not. Once again, using this design, even if those who participated in the program have much higher levels of knowledge than those who did not, it may not be the result of the program. For example, the legislators who read the materials and agreed to be visited might be those with an interest in clean indoor air. As a result, even before contact with the program, they may have known more about this subject than the other legislators.

In general, the strongest designs have the elements of both of these pre-experimental designs, that is, they have both a pre-program assessment and a comparison group.

Back to four types of quantitative design

Experimental Designs

Experimental designs randomly assign evaluation participants to one of two or more groups. The effects of the program are measured by comparing the changes in the various groups’ knowledge, attitudes, beliefs, behaviors, or disease rates. Randomization ensures that the various groups are as similar as possible, thus allowing evaluators of the program’s short- and long-term outcomes to eliminate factors outside the program as reasons for changes in program participants’ knowledge, attitudes, beliefs, tobacco-related behavior, or disease rates (e.g., lung cancer).

Evaluation with an experimental design produces the strongest evidence that a program contributed to a change in the knowledge, attitudes, beliefs, behaviors, or disease of the target population.

Although experimental designs are ideal for program evaluation, they are often difficult—sometimes impossible—to set up. The difficulty may be due to logistical problems, budgetary limitations, or political circumstances. To demonstrate the difficulties, let us consider the example we described above of a school-based program designed to increase knowledge of the unhealthy effects of tobacco and skills for resisting peer pressure to use tobacco among minority youth. Let’s assume we are targeting African American youth, and have identified one or more middle schools with almost exclusively African American student bodies.

  • Logistical Problems: Suppose we decided to randomize students into two groups: one to receive our program’s curriculum in their health class, and the other to receive the current tobacco curriculum in their health class. One logistical problem would be that students assigned to receive different curricula might be in the same health class. Clearly, it would be impossible for the teacher to teach two different curricula at once. Even if we randomized by health class within the school, students receiving one curriculum might talk about it with their friends who were receiving the other. This would contaminate the results of the evaluation.
     
  • Budgetary Problems: We might solve the logistical problems just described by randomizing schools into our two groups instead of randomizing students or health classes. But, this would increase our costs since we would have to train more teachers. Statistically, randomizing by school would also mean that we needed to include more students in the evaluation than if we randomized the students themselves. That would mean more materials to hand out, and more tests to score, all of which translates into increased costs.
     
  • Political Problems: If we randomized by school, we might also run into political problems. For example, parents might feel that their children are being treated unfairly if they do not receive the new curriculum. Likewise, principals or teachers might refuse to participate in the program’s evaluation unless they receive the new curriculum.

It is problems like these that have led to the use of other designs in “real world” evaluation.

Back to four types of quantitative design

Quasi-Experimental Designs

Because of the difficulties with experimental designs, programs sometimes use quasi-experimental designs. Almost all of the quasi-experimental designs compare outcomes from program participants to outcomes for comparison groups that do not receive program services.

Quasi-experimental designs do not require random assignment of participants to one or another group. Instead, the evaluator selects a whole group (e.g., another community similar to the one in which the program is being conducted) as the comparison or control group. When using quasi-experimental designs with comparison groups, because group assignment is not random evaluators must take extra care to ensure that the intervention group is similar to the comparison group. In addition, the evaluator must gather the necessary information about both groups to be able to describe the ways in which the groups are not similar.

There is one quasi-experimental design that can be used if a suitable comparison group cannot be found. In this design, know as time-series, the evaluator takes multiple measurements of the intervention group before providing the program, and multiple measures after the program, as well. Using this design, it is possible to study the pattern of change in the target population (e.g., are they smoking more each year?) and to document unexpected changes in the pattern that coincide with the introduction of the program.

Back to four types of quantitative design

Choose the Evaluation Best Suited For You

The type of evaluation you choose (formative, process, or outcome) should be determined by the purpose of your evaluation. The design you choose (quantitative or qualitative) should be determined by both the type of evaluation being conducted and the resources available, including money, staff, and suitable comparison groups.

Following are examples of different types of evaluation using a variety of qualitative and quantitative designs.

  Qualitative Quantitative
Formative Evaluation:
To determine which of several flyers is most effective for getting people to participate in your program
Focus Groups:
Present each flyer, in turn, to focus group participants from the target population, and ask them to discuss what they do and do not like about it. Then present all flyers together, and ask participants to discuss which is the best choice.
Descriptive Design:
Ask a sample of the target population to rank each flyer. Compare the average rankings by gender, age, and neighborhood.
Process Evaluation:
To determine whether recruitment is proceeding as planned
Personal Interviews:
Interview each program recruiter. Ask how recruitment is progressing. Probe to determine whether or not they have encountered problems in recruiting.
 
Attempt to interview persons who have chosen not to participate. Ask them to describe their experience of contact with the program.
Descriptive Design:
Determine the number of persons contacted, and the percentage of those contacted who have agreed to participate. Compare the percent participating by gender, age, and neighborhood.
Short-Term Outcome Evaluation:
To determine the effectiveness of the program in changing attitudes.
Quantitative designs are the most appropriate here, but quotes from interviews might be used to supplement quantitative data by giving examples of attitude changes. This puts a “face” on the data. Experimental Design:
Administer an instrument to measure attitudes to persons who were assigned to the program, and persons who were assigned to receive the usual “treatment”. Compare the average attitudes for these two groups.
 
Quasi-experimental Design:
Administer an instrument to measure attitudes to persons who participated in the program, and persons in the comparison group. Compare the average attitudes for these two groups.
Long-Term Outcome Evaluation:
To determine the effectiveness of the program in reducing smoking rates.
Quantitative designs are the most appropriate here, but quotes from interviews might be used to supplement quantitative data by giving a picture of how the program changed several individuals’ smoking behavior. Again, this puts a “face” on the data. Experimental Design:
Determine the number of people who are in the population that was assigned to receive the program (denominator) and how many of them smoke (numerator). Divide the numerator by the denominator and multiply by 100 to get a percentage. Collect the same information for persons who were assigned to receive the usual “treatment”, and calculate their percentage. Compare the percentages for these two groups.
 
Quasi-experimental Design:
Determine the number of people who are in the population that is receiving the program (denominator) and how many of them smoke (numerator). Divide the numerator by the denominator and multiply by 100 to get a percentage. Collect the same information for people in the comparison (control) group, and calculate their percentage. Compare the percentages for these two groups.

---------------
1. Source: Thompson, N.J. & McClintock, H.O. (1998). Demonstrating your program's worth: A primer on evaluation for programs to prevent unintentional injury. Atlanta, GA: National Center for Injury Prevention and Control, Centers for Disease Control and Prevention. http://www.cdc.gov/ncipc/pub-res/demonstr.htm
 
2. Source: Campbell, D.T. & Stanley, J.C. (1963). Experimental and quasi-experimental designs for research. Chicago: R. McNally.

 

 
Search TTAC