Original Article
A parametric analysis of ordinal quality-of-life data can lead to erroneous results

https://doi.org/10.1016/j.jclinepi.2007.05.019Get rights and content

Abstract

Objective

Measurements from health-related quality-of-life (HRQoL) studies, although usually of an ordered categorical nature, are typically treated as continuous variables, allowing the calculation of mean values and the administration of parametric statistics, such as t-tests. We investigated whether parametric, compared to nonparametric, analyses of ordered categorical data may lead to different conclusions.

Study Design and Setting

HRQoL data were obtained from patients with a diagnosis of asthma (n = 192) and chronic obstructive pulmonary disease (COPD; n = 88) at two time points. The impact of the group factor (asthma vs. COPD) and the time factor (t1 vs. t2) on HRQoL was analyzed with a metric approach (repeated measures ANOVA) and two ordinal approaches (each with a nonparametric repeated measures ANOVA).

Results

Using the metric approach, a significant effect of “group” (P = 0.0061) and “time” (P = 0.0049) on HRQoL was found. The first ordinal approach (ranked total score) still showed a significant effect for “group” (P = 0.0033) with a worse HRQoL for patients suffering from COPD. In the second approach (ranks for each HRQoL item and summed ranks), there were no significant effects.

Conclusion

Applying simple parametric methods to ordered categorical HRQoL scores led to different results from those obtained with nonparametric methods. In these cases, an ordinal approach will prevent inappropriate conclusions.

Introduction

Health-related quality of life (HRQoL) is a strongly recommended and widely used measure. It is used to assess the health status of patients as the personal burden of illness cannot be described adequately by measures of disease status such as tumor load or forced expiratory volume [1]. The need to incorporate patients' opinions, values, and preferences is what distinguishes HRQoL from all other measures of health [2]. In recent years, it has become increasingly clear that questionnaires can provide accurate evidence of outcomes from the patient's perspective [3].

Skepticism and confusion remain as to how HRQoL should be measured and analyzed. Most HRQoL questionnaires (“instruments”) consist of items with a Likert-scaled format (e.g., “Do you have any trouble taking a long walk?”) with four or more response categories provided: “not at all,” “a little,” “quite a bit,” “very much” [4]. Several items are often pooled together to generate a score such as a physical functioning score, a mental health score, or a social functioning score [5], [6]. Sometimes the items are weighted before pooling. For an easy analysis, the measurements are typically treated as continuous variables, using standard linear models and corresponding estimators of statistics (e.g., allowing the calculation of simple sums or mean values and t-tests or ANOVAs).

Although this approach is frequently used, it is criticized because methods for analyzing continuous data are applied to ordered categorical (“discrete”) variables, and measurement scales with an ordinal structure are treated as metric variables [7], [8]. The ordinal structure of the data does not allow the interpretation of differences and means. Instead, HRQoL data may require different techniques of analysis that take into account the ordinal character of the data, such as the methods described by Akritas and coworkers [9], [10], Brunner and Langer [11], or Agresti [12]. Otherwise, the statistical validity of the results may be doubtful [13]. Munzel and Bandelow [14] discussed this problem for psychiatric studies, and Singer et al. [7] illustrated it with an example of scores used in dentistry.

In recent years, there has been an increasing interest in the use of the family of Rasch models and, more generally, the item response theory (IRT) that offers some advantages compared to traditional approaches. These models provide the means for constructing interval measures from raw data, even if these are nominal or ordinal [15]. Moreover, IRT models yield estimates that do not vary with the characteristics of the population with respect to the underlying trait [16]. That means the single person, according to their response, as well as the individual items, according to their difficulty, are conceptualized to lie on the same interval scale so that parametric statistical methods are considered appropriate for analysis. In IRT models, the measured proficiency or attitude of a person does not depend on who else takes part in the measurement or the “difficulty” level of the items (parameter separation) [15]. One major disadvantage of IRT, however, is that it requires software designed for specialists, which is often inconvenient to clinicians and even to other researchers [16].

In this paper, we applied both a standard parametric analysis and two additional different nonparametric approaches to compare the quality-of-life assessment made by two groups of patients with airway obstruction at a baseline survey and 6 months later. Using different methods to analyze the HRQoL data, we studied whether these patients differed in their HRQoL scores according to their group and/or the time point of the survey. We were especially interested to determine whether a metric approach, compared to ordinal approaches, may lead to different results—and the danger of potentially inappropriate conclusions.

We hypothesized that both groups did not differ in their HRQoL and that neither of the groups had a better—or worse—HRQoL after 6 months. The time span seemed much too short for such an effect, especially as no intervention to improve the patients' quality of life had taken place. The sample was drawn from general practices so that we did not expect an overrepresentation of severe cases in any of the two groups. This was also the case in a large Spanish representative sample of the general population [17].

Section snippets

Sample and procedure

Data were taken from the MedViP project (“Medizinische Versorgung in der Praxis” [Medical Care in General Practice]; www.medvip.uni-goettingen.de). The study protocol was approved by the Research Ethics Committee of the University of Göttingen. Design and recruitment have been described in detail elsewhere [18].

In brief, general practitioners were invited to provide routinely collected electronic medical data. Electronic patient records (EPRs) were extracted via a standardized interface. We

Characteristics of the sample

A total number of 327 patients with either asthma or COPD from 39 practices gave informed consent to take part in the study. Complete data concerning quality of life were obtained from 300 patients, slightly more than half of them female (54.6%). The mean age of the patients was 57.9 (range: 19.5–81.2) years. According to spirometry, 192 patients (64.0%) had a diagnosis of asthma and showed absence or reversibility of bronchial obstruction. Eighty-eight patients (29.3%) showed nonreversible

Discussion

Like other diagnostic procedures, quality-of-life measures should be valid, reliable, and sensitive over time [1]. We could demonstrate that applying simple parametric methods to ordered categorical HRQoL scores generated results deviant from those we obtained with nonparametric methods. One reason for the differences could be that applying parametric analysis to ordered categorical HRQoL data is not appropriate, because in doing so one implies that differences between ordered categorical data

Acknowledgment

The MedViP project is supported by a research grant from the German Ministry of Research and Education (01 GK 0201).

References (34)

  • J.M. Singer et al.

    Parametric and nonparametric analyses of repeated ordinal categorical data

    Biom J

    (2004)
  • U. Munzel et al.

    A global view on parametric and nonparametric approaches to the analysis of ordered categorical data

    Biom J

    (2004)
  • M.G. Akritas et al.

    Nonparametric hypotheses and rank statistics for unbalanced factorial designs

    J Am Stat Assoc

    (1997)
  • E. Brunner et al.

    Nonparametric analysis of ordered categorical data in designs with longitudinal observations and small sample sizes

    Biom J

    (2000)
  • A. Agresti

    Categorical data analysis

    (2002)
  • C.M. Norris et al.

    Systematic review of statistical methods used to analyze Seattle Angina Questionnaire scores

    Can J Cardiol

    (2004)
  • U. Munzel et al.

    The use of parametric versus nonparametric tests in the statistical evaluation of rating scales

    Pharmacopsychiatry

    (1998)
  • Cited by (39)

    • The best way to assess visually induced motion sickness in a fixed-base driving simulator

      2017, Transportation Research Part F: Traffic Psychology and Behaviour
      Citation Excerpt :

      In comparing these two models, the model assuming symmetric thresholds was preferable (H1c). Symmetric thresholds have been hypothesized for similar scales in the literature (Grimby et al., 2012), and the frequent application of more common parametric analysis strategies for this kind of data has been shown to potentially lead to incorrect inferences (Kahler et al., 2008). An analysis of FMS scores based on ordinal mixed models is not only an adequate response to the literature, but may, as was the case in our sample, benefit the analysis through better data fit and additional insights into the scale at hand.

    View all citing articles on Scopus
    View full text