Evaluation Criteria For Outcome Measures

It is necessary to have a set of criteria to guide the selection of outcomes measures. Reliability, validity and responsiveness have widespread usage and are discussed as being essential to the evaluation of outcome measures (Duncan et al. 2002; Law 2002; Roberts & Counsell 1998; van der Putten et al. 1999). Finch et al. (2002) provide a good tutorial on issues for outcome measure selection.

The Health Technology Assessment programme (Fitzpatrick et al. 1998) examined 413 articles focusing on methodological aspects of the use and development of patient-based outcome measures. In their report, they recommend the use of eight evaluation criteria. These criteria, including some additional considerations described below, were applied to each of the outcome measures reviewed in this chapter.

Table:  Evaluation Criteria and Standards

Each measure reviewed was also assessed for the thoroughness with which its reliability, validity and responsiveness have been reported in the literature. Standards for evaluation of rigor were adapted from McDowell & Newell (1996) and Andresen (2000).

Table: Evaluation Standards – Rigor  

Thoroughness or Rigor of testing

Excellent: most major forms of evaluation reported. 
Adequate: several studies and/or several types of testing reported.
Poor: minimal information and/or few studies (other than author’s) are reported.
N/A: no information available.


Assessments of rigor using the above standards are given along with evaluation ratings for reliability, validity and responsiveness for each. 


Table: Evaluation Summary


















NOTE: +++=Excellent; ++=Adequate; +=Poor; N/A=insufficient information; TR=Test re-test; IC=Internal Consistency; IO=Interobserver; Varied (re. floor/ceiling effects; mixed results).


Ratings of +++ (excellent), ++ (adequate) and + (poor) are assigned based on the criteria and evidence presented in the standards column of Table 17.2. For example, if a rating of “+++” or excellent is given for validity, it means that evidence has been presented demonstrating excellent construct validity based on the standards provided and in various forms including convergent and discriminant validity, as well as predictive validity.

In addition to the criteria outlined above, the following additional issues were considered: 

  • Has the measure been used in an ABI/TBI population?
  • Has the measure been tested for use with proxy assessment?