 |
| “I
can give you a six-word formula for success:
Think things through – then follow through.”
- Edward Vernon (Eddie) Rickenbacker |
Choose the Evaluation Type and Design
As with your tobacco control program, one size evaluation does
not fit all. When you set the goals for your program, you made
them specific for different audiences in order to increase your
likelihood of success. Evaluation requires the same kind of
specificity. Based on what you want to get out of the evaluation,
you will select different types of evaluations, and different
evaluation designs.
The evaluation “type” defines what you would
like to evaluate about the program, like the formation of
one of the activities, or an outcome. The “design”
is the methods you will use to gather the information that
will answer your questions about things like the program’s
formation and outcomes.
A choice of evaluation type and design is made based on these
three criteria:
- Your program goals
- The purpose of your evaluation
- The questions you want answered
Let’s take a closer look at evaluation
types and designs. Click on each for more.
Evaluation Types
There are four basic types of evaluation: formative evaluation,
process evaluation, short-term outcome evaluation, and long-term
outcome evaluation. Here’s a brief overview of each:
- Formative evaluation is the process of testing program
plans, messages, materials, strategies, or modifications
for weaknesses and strengths before they are put into effect.
Formative evaluation is also used when an unanticipated
problem occurs after the program is in effect.
- Process evaluation is the mechanism for testing the delivery
of the program, rather than its effect. For example, process
evaluation would be used to determine whether the program’s
procedures for reaching the target population are working
as planned.
- Short-term outcome evaluation is used to measure the immediate
effect the program has on the intended audience soon after
implementation.
- Long-term outcome evaluation is used to measure the long-term
effect of the program. It is the process of measuring whether
your program met its ultimate goal of reducing tobacco-related
disease and death. The results of long-term outcome evaluation
often impact decisions to continue program funding.
Different types of evaluation may be more prominent at different
points in the program. With a new program, for example, you
might want to conduct a formative evaluation to help you design
your activities so that they work. With an existing program,
you might want to conduct an outcome evaluation to assess
your program's effectiveness and to demonstrate that it is
making productive use of resources. The table below gives
you more details about the purpose and features of each type,
when to use each, what each shows, and why each is most useful.
| Type |
What it is |
When To Use |
What It Shows |
Why it is useful |
| Formative Evaluation |
Formative evaluation ensures that program
materials, strategies, and activities are of the highest
possible quality. When materials strategies and activities
are being developed, the purpose of evaluation is to ensure
that the program aspect being evaluated is feasible, appropriate,
meaningful, acceptable, and culturally appropriate for
the tobacco control program and the program’s target
population. |
Begin formative evaluation as soon as the
idea for a program is conceived.
During the development of a new program.
When an existing program
1) is being modified,
2) has problems with no obvious solutions, or
3) is being used in a new setting, with a new population,
or to target a new problem or behavior.
Whenever an existing program
is being adapted for use with a different target population
or in a new location or setting |
Formative evaluation shows:
Whether proposed messages
are likely to reach, to be understood by, and to be accepted
by the people you are trying to serve (e.g., shows strengths
and weaknesses of proposed written materials).
How people in the target
population get information (e.g., which newspapers they
read or radio stations they listen to).
Whom the target population
respects as a spokesperson (e.g., a sports celebrity or
the local preacher).
Details that program developers may have overlooked about
materials, strategies, or mechanisms for distributing
information (e.g., that the target population has difficulty
reaching the location where training classes are held). |
Formative evaluation is useful because it:
Allows programs
to make revisions before the full effort begins.
Maximizes the likelihood
that the program will succeed |
| Type |
What it is |
When To Use |
What It Shows |
Why it is useful |
| Process Evaluation |
The purpose of process evaluation is to learn
whether the program is being delivered as planned, e.g.,
whether it is serving the target population, and whether
the number of people being served is more or less than
expected. Process evaluation is linked to the activities
and outputs in your logic model (see Setting the Stage).
It is concerned with quality control in the delivery of
activities, and counts of outputs. For example, it involves
counting all contacts with the people you are trying to
reach and counting all the events that resulted from those
contacts. |
Begin process evaluation as soon as the program
goes into operation. At this stage, you are not looking
for results. You are merely trying to learn
whether you are delivering
your program according to the plan or protocol,
whether you are connecting
with the people in your target population, and
whether the people in the
target population are connecting with you. |
Process evaluation shows how well a program
is being delivered. For example, are volunteer callers
providing the same information? How many people are participating
in the program? Are certain people or areas not participating? |
Process evaluation:
Ensures program consistency across various staff members
or shifts, or over time.
Identifies early any problems that occur in reaching the
target population.
Allows programs to evaluate how well their plans, procedures,
activities, and materials are actually working and to
make adjustments before logistical or administrative weaknesses
become entrenched. |
| Type |
What it is |
When To Use |
What It Shows |
Why it is useful |
| Short-term outcome evaluation |
The purpose of short-term outcome evaluation
is to measure whatever changes the program creates in
the target population’s knowledge, attitudes, beliefs,
or behaviors. It is linked with the short-term outcomes
in your logic model (see Setting the Stage) |
Use short-term outcome evaluation after the
program has made contact with at least one person or one
group of people in the target population.
Collect baseline information for short-term outcome evaluation
immediately before, or just as, the program goes into
operation.
Gather information on changes brought about by the program
as soon as program personnel have completed their first
encounter with an individual or group from the target
population. |
Short-term outcome evaluation
shows: The
degree to which a program is meeting its intermediate
goals. For example, how awareness of the hazards of environmental
smoke has changed among participants.
Changes in the target population’s knowledge,
attitudes, and beliefs about environmental smoke. |
Short-term outcome evaluation:
Provides an indication of program effectiveness
before sufficient time has elapsed to meet the ultimate
goals. Allows management to
modify materials or move resources from a nonproductive
to a productive area of the program. Tells
programs whether they are moving in the direction of achieving
their goals. |
| Type |
What
it is |
When To Use |
What It Shows |
Why it is useful |
| Long-term outcome evaluation |
The purpose of long-term outcome evaluation
is to learn how well the program succeeded in achieving
its ultimate goal. This type of evaluation is linked with
the Impact(s) in your logic model (see Setting the Stage).
One goal example might be to decrease smoking-related
illness and death. As with many goals related to tobacco
control, a goal like this is often difficult to measure
since illness and death due to smoking have a long latency
period, often years. |
Collect baseline information for long-term
outcome (e.g., impact) evaluation immediately before,
or just as, the program goes into operation.For ongoing
programs, like a series of smoking and health classes
taught each year to all third graders in your area, conduct
follow-up portion of long-term outcome evaluation at specified
intervals (e.g., every year, every 3 years, or every 5
years). For one-time programs, like a coalition to lobby
for passage of a local ordinance against smoking in public
places, conduct follow-up portion OF long-term outcome
evaluation after the program is finished. |
Long-term outcome evaluation shows the degree
to which the program has met its ultimate goals, like
how much illness and death related to smoking were reduced
by early smoking and health education. |
Long-term outcome evaluation:
Allows programs to learn from their successes and failures
so that they, and others, can incorporate what they have
learned into new projects.
Provides evidence of success for use in future requests
for funding. |
Top
Evaluation Designs
Tip:
If you use an Experimental or Quasi-experimental
design for your program, short- and long-term
outcome evaluation will be a breeze because, in
effect, you will be operating and evaluating the
program at the same time.1 |
|
An evaluation design is the structure you set up that allows
you to demonstrate your program’s performance. An evaluation
design provides the framework for drawing conclusions. There
are many designs used by evaluators, and they range from simple
to complex, and subjective to objective. Evaluation designs
generally fall into two main categories: Qualitative
and Quantitative.
Qualitative
Methods
Qualitative methods are used to probe the feelings, beliefs,
and impressions of the people participating in the program.
They allow the evaluator to identify issues that he or she
never considered, and to judge the intensity of people’s
preferences (e.g., for one brochure or another) without prejudicing
participants with the evaluator’s own opinions. They
do this by gathering information in a natural format.
Rather than relying upon structured questions, qualitative
methods use open-ended questions so that respondents can answer
according to their own beliefs. For example, rather than asking
“Does the location of the program influence your attendance?,”
a qualitative study would ask, “What are some things
that will influence whether or not you attend the program?”
Because qualitative methods are open-ended, they are especially
valuable at the formative stage of evaluation, when programs
are developing and pilot-testing proposed procedures, activities,
and materials.
Qualitative methods are particularly useful for testing program
elements before they are included in the program, or if a
problem arises after they are in use. For example, suppose
you decided to convene a meeting of representatives of organizations
that are interested in smoke-free environment legislation
to form a coalition. If only 3 people showed up out of 15
organizations, you might want to know why. To find out, you
could use qualitative methods. You might call the organizations
and interview a specified individual (e.g., the executive
director) to determine why their representatives did not attend.
Perhaps they did not receive the invitation. Perhaps the meeting
location was inconvenient. Perhaps there was another important
meeting related to smoke-free environments taking place at
the same time.
Qualitative methods will allow you to identify the nature
of the problem. Using these methods, evaluators can usually
determine the cause of any problem because they ask those
most directly involved. Once armed with knowledge about the
cause, program staff can usually correct problems before major
damage is done, so the results can be more effective the next
time.
Qualitative methods can be used for formative evaluation,
e.g., to ask participants to describe the information presented
to them, to ensure that a standard protocol was followed.
Qualitative methods can also be used to describe short-term
outcomes such as participant satisfaction with the program
or its activities. Presentations of this type of information
in the direct words of the participants can be particularly
appealing.
Some common ways to conduct qualitative evaluation include
observations, interviews, and focus groups. For more about
ways to conduct qualitative evaluations, visit The
Power of Proof: Data Collection.
Quantitative
Methods
Quantitative methods are ways of gathering objective data
that can be expressed in numbers (e.g., a count of the people
with whom a program had contact or the percentage reduction
in a particular unhealthy behavior by the target population).
Unlike the results produced by qualitative methods, under
the right conditions results produced by quantitative methods
can be used to draw conclusions about the entire target population.
For example, suppose we conducted a school-based program designed
to increase knowledge of the unhealthy effects of tobacco
and skills for resisting peer pressure to use tobacco among
minority youth. Suppose, too, that our evaluation found a
50% increase in their knowledge regarding the adverse effects
of tobacco use. If the sample of minority youth included in
the evaluation was representative of the target population
of minority youth, then if the entire target population participated
in the program we could assume their knowledge would increase
by a value similar to 50% (within a “confidence interval”).
Quantitative methods are used during process and short- and
long-term outcome evaluation. Occasionally, these methods
are also used during formative evaluation to measure, for
example, the level of participant satisfaction with the program.
There are four types of quantitative
designs: descriptive, pre-experimental, experimental, and
quasi-experimental2. Click on each for more information:
Descriptive Designs
Descriptive designs are typically used for formative or process
evaluations. These designs include only people from the target
population who are eligible to participate in some part of
the program. As in a case study, the purpose is to describe
some attribute of the target population members, e.g., their
attitudes toward each of three public service announcements,
their attendance at program sessions, or their willingness
to meet with program personnel. Descriptive designs might
also include some analytic comparisons within the target population.
Such comparisons might be made among specific subsets of participants,
for example, Do female participants prefer a different announcement
than male participants? Do older participants attend more
frequently than younger participants?
Back to four types of quantitative design
Pre-experimental Designs
Pre-experimental designs can document whether changes are
present in the group that receives the program, but they do
not have the power of proof. In order to prove a
program’s effectiveness, the evaluation must show evidence
that the program is the only possible explanation for the
results that are achieved.
Let’s consider a program that was designed to educate
state legislators and their staff about the benefits of clean
indoor air. Program staff mailed literature to the legislators
and their staff, and then visited them in person to answer
questions. There are two pre-experimental designs that might
commonly be used to assess the effect of such a program.
One pre-experimental design would be to measure the legislators’
and staff’s knowledge about the issue before the program
performed their mailings and visits, and then to measure it
again after the visits were completed. This is sometimes called
a “pretest-posttest design”. The problem with
this design is that, even if there is a dramatic increase
in knowledge, it may not be the result of the program. While
the program was going on, another program may have contacted
the legislators, as well. Likewise, they may have read a newspaper
article, found a website, or seen a television special about
clean indoor air.
Another pre-experimental design would be to wait until after
the mailings and visits took place, and then measure knowledge
in legislators and staff who participated in the program and
in legislators and staff who did not. Once again, using this
design, even if those who participated in the program have
much higher levels of knowledge than those who did not, it
may not be the result of the program. For example, the legislators
who read the materials and agreed to be visited might be those
with an interest in clean indoor air. As a result, even before
contact with the program, they may have known more about this
subject than the other legislators.
In general, the strongest designs have the elements of both
of these pre-experimental designs, that is, they have both
a pre-program assessment and a comparison
group.
Back to four types of quantitative design
Experimental Designs
Experimental designs randomly assign evaluation participants
to one of two or more groups. The effects of the program are
measured by comparing the changes in the various groups’
knowledge, attitudes, beliefs, behaviors, or disease rates.
Randomization ensures that the various groups are as similar
as possible, thus allowing evaluators of the program’s
short- and long-term outcomes to eliminate factors outside
the program as reasons for changes in program participants’
knowledge, attitudes, beliefs, tobacco-related behavior, or
disease rates (e.g., lung cancer).
Evaluation with an experimental design produces the strongest
evidence that a program contributed to a change in the knowledge,
attitudes, beliefs, behaviors, or disease of the target population.
Although experimental designs are ideal for program evaluation,
they are often difficult—sometimes impossible—to
set up. The difficulty may be due to logistical problems,
budgetary limitations, or political circumstances. To demonstrate
the difficulties, let us consider the example we described
above of a school-based program designed to increase knowledge
of the unhealthy effects of tobacco and skills for resisting
peer pressure to use tobacco among minority youth. Let’s
assume we are targeting African American youth, and have identified
one or more middle schools with almost exclusively African
American student bodies.
- Logistical Problems: Suppose we decided to randomize students
into two groups: one to receive our program’s curriculum
in their health class, and the other to receive the current
tobacco curriculum in their health class. One logistical
problem would be that students assigned to receive different
curricula might be in the same health class. Clearly, it
would be impossible for the teacher to teach two different
curricula at once. Even if we randomized by health class
within the school, students receiving one curriculum might
talk about it with their friends who were receiving the
other. This would contaminate the results of the evaluation.
- Budgetary Problems: We might solve the logistical problems
just described by randomizing schools into our two groups
instead of randomizing students or health classes. But,
this would increase our costs since we would have to train
more teachers. Statistically, randomizing by school would
also mean that we needed to include more students in the
evaluation than if we randomized the students themselves.
That would mean more materials to hand out, and more tests
to score, all of which translates into increased costs.
- Political Problems: If we randomized by school, we might
also run into political problems. For example, parents might
feel that their children are being treated unfairly if they
do not receive the new curriculum. Likewise, principals
or teachers might refuse to participate in the program’s
evaluation unless they receive the new curriculum.
It is problems like these that have led to the use of other
designs in “real world” evaluation.
Back to four types of quantitative design
Quasi-Experimental Designs
Because of the difficulties with experimental designs, programs
sometimes use quasi-experimental designs. Almost all of the
quasi-experimental designs compare outcomes from program participants
to outcomes for comparison groups that do not receive program
services.
Quasi-experimental designs do not require random assignment
of participants to one or another group. Instead, the evaluator
selects a whole group (e.g., another community similar to
the one in which the program is being conducted) as the comparison
or control group. When using quasi-experimental designs with
comparison groups, because group assignment is not random
evaluators must take extra care to ensure that the intervention
group is similar to the comparison group. In addition, the
evaluator must gather the necessary information about both
groups to be able to describe the ways in which the groups
are not similar.
There is one quasi-experimental design that can be used if
a suitable comparison group cannot be found. In this design,
know as time-series, the evaluator takes
multiple measurements of the intervention group before
providing the program, and multiple measures after the program,
as well. Using this design, it is possible to study the pattern
of change in the target population (e.g., are they smoking
more each year?) and to document unexpected changes in the
pattern that coincide with the introduction of the program.
Back to four types of quantitative
design
Choose the Evaluation Best
Suited For You
The type of evaluation you choose (formative,
process, or outcome) should be determined by the purpose of
your evaluation. The design you choose (quantitative or qualitative)
should be determined by both the type of evaluation being
conducted and the resources available, including money, staff,
and suitable comparison groups.
Following are examples of different types of
evaluation using a variety of qualitative and quantitative
designs.
| |
Qualitative |
Quantitative |
Formative Evaluation:
To determine which of several flyers is most effective
for getting people to participate in your program |
Focus Groups:
Present each flyer, in turn, to focus group participants
from the target population, and ask them to discuss what
they do and do not like about it. Then present all flyers
together, and ask participants to discuss which is the
best choice. |
Descriptive Design:
Ask a sample of the target population to rank each flyer.
Compare the average rankings by gender, age, and neighborhood. |
Process Evaluation:
To determine whether recruitment is proceeding as planned |
Personal Interviews:
Interview each program recruiter. Ask how recruitment
is progressing. Probe to determine whether or not they
have encountered problems in recruiting.
Attempt to interview persons who have chosen not to participate.
Ask them to describe their experience of contact with
the program. |
Descriptive Design:
Determine the number of persons contacted, and the percentage
of those contacted who have agreed to participate. Compare
the percent participating by gender, age, and neighborhood. |
Short-Term Outcome
Evaluation:
To determine the effectiveness of the program in changing
attitudes. |
Quantitative designs are
the most appropriate here, but quotes from interviews
might be used to supplement quantitative data by giving
examples of attitude changes. This puts a “face”
on the data. |
Experimental Design:
Administer an instrument to measure attitudes
to persons who were assigned to the program, and persons
who were assigned to receive the usual “treatment”.
Compare the average attitudes for these two groups.
Quasi-experimental Design:
Administer an instrument to measure attitudes to persons
who participated in the program, and persons in the comparison
group. Compare the average attitudes for these two groups. |
Long-Term Outcome
Evaluation:
To determine the effectiveness of the program in reducing
smoking rates. |
Quantitative designs are
the most appropriate here, but quotes from interviews
might be used to supplement quantitative data by giving
a picture of how the program changed several individuals’
smoking behavior. Again, this puts a “face”
on the data. |
Experimental Design:
Determine the number of people who are in the population
that was assigned to receive the program (denominator)
and how many of them smoke (numerator). Divide the numerator
by the denominator and multiply by 100 to get a percentage.
Collect the same information for persons who were assigned
to receive the usual “treatment”, and calculate
their percentage. Compare the percentages for these two
groups. Quasi-experimental Design:
Determine the number of people who are in the population
that is receiving the program (denominator) and how many
of them smoke (numerator). Divide the numerator by the
denominator and multiply by 100 to get a percentage. Collect
the same information for people in the comparison (control)
group, and calculate their percentage. Compare the percentages
for these two groups. |
---------------
1. Source: Thompson, N.J. & McClintock,
H.O. (1998). Demonstrating your program's worth: A primer
on evaluation for programs to prevent unintentional injury.
Atlanta, GA: National Center for Injury Prevention and Control,
Centers for Disease Control and Prevention. http://www.cdc.gov/ncipc/pub-res/demonstr.htm
2. Source: Campbell, D.T. & Stanley, J.C. (1963). Experimental
and quasi-experimental designs for research. Chicago:
R. McNally.
|
 |