Original Article
Classifying developmental trajectories over time should be done with great caution: a comparison between methods

https://doi.org/10.1016/j.jclinepi.2012.04.010Get rights and content

Abstract

Objective

In the analysis of data from longitudinal cohort studies, there is a growing interest in the analysis of developmental trajectories in subpopulations of the cohort under study. There are different advanced statistical methods available to analyze these trajectories, but in the epidemiologic literature, most of those are never used. The purpose of the present study is to compare five statistical methods to detect developmental trajectories in a longitudinal epidemiological data set.

Study Design and Setting

All five statistical methods (K-means clustering, a “two-step” approach with mixed modeling and K-means clustering, latent class analysis [LCA], latent class growth analysis [LCGA], and latent class growth mixture modeling [LCGMM]) were performed on a real-life data set and two manipulated data sets. The first manipulated data set contained four different linear developments over time, whereas the second contained two linear and two quadratic developments.

Results

For the real-life data set, all five classification methods revealed comparable trajectories. Regarding the manipulated data sets, LCGA performed best in detecting linear trajectories, whereas none of the methods performed well in detecting a combination of linear and quadratic trajectories. Furthermore, the optimal solution for LCA and LCGA contained more classes compared with LCGMM.

Conclusion

Although LCGA and LCGMM seem to be preferable above the more simple methods, all classification methods should be applied with great caution.

Introduction

What is new?

Key findings

  1. Although latent class growth analysis (LCGA) and latent class growth mixture modeling (LCGMM) seem to be preferable above the more simple methods, all classification methods should be applied with great caution.

What this adds to what was known?
  1. To our knowledge, this is the first study that compares five statistical methods to classify developmental trajectories over time with each other in a practical way without any complicated mathematical issues.

  2. When there are only linear developments over time, LCGA performed the best.

  3. When there are both linear and quadratic developments over time, all classification methods did not perform well.

  4. The number of classes in the optimal solution derived from LCA and LCGA is (much) bigger compared with LCGMM.

What is the implication and what should change now?
  1. This article can be used by practical researchers to choose the optimal method to classify developmental trajectories over time.

  2. This article shows that all classification methods should be applied with great caution.

Within medical and epidemiological research, prospective cohort studies become more and more important. One of the main reasons for this increasing popularity is the possibility to study individual development over time. In addition, researchers are often interested in dividing the cohort under study into groups of subjects with comparable developmental trajectories. First, as a tool to describe the population under study and second as a first step to study either the determinants of different trajectories or the consequences of different trajectories. Although the division into subgroups can be done in many different ways, it is surprising that in medical and epidemiological research, sophisticated methods are hardly used. This could be owing to the fact that most of them are based on structural equation modeling (SEM) [1], a fairly complex statistical technique particularly popular in psychology and social science, but not so much in medical science and epidemiology. Reviewing the available literature, the techniques used to define subgroups of developmental trajectories can be divided into (1) cross-sectional (naive) techniques (i.e., techniques that ignore the longitudinal structure of the data) and (2) longitudinal techniques that define the subgroups according to the parameters of the individual growth curves.

The few examples of the classification of developmental trajectories found in the medical and epidemiological literature deal mainly with the development of substance abuse [2], [3], [4], [5]; the development of functional limitations in the elderly [6], [7], [8]; pediatrics [9], [10], [11], [12]; and some specific topics, such as low back pain [13], night time bladder control [14], anxiety and depressive disorders [15], and body fatness [16]. It is striking to see that in these medical and epidemiological studies, there is no consistency in the use of a statistical approach to classify developmental trajectories. The methodology differs from relatively simple cross-sectional methods to complicated SEM techniques.

The purpose of the present study is to compare several methods with each other, which classifies individuals according to their developmental trajectories. This will first be done on two data sets in which particular developments are manipulated into the data and second on a real-life data set.

Section snippets

Manipulated data sets

Two data sets were created; the first data set consists of four linear developmental trajectories over time, whereas the second data set consists of a combination of two linear and two quadratic developmental trajectories.

The following procedure was used to create the data sets, in which the starting point was an epidemiological data set with six repeated measurements on 588 subjects: (1) The data at each time point was standardized to set the average development over time to zero. Then 2.5

Manipulated data sets

Figure 1 shows the results of the different analyses regarding classification of the linear developmental trajectories, and Table 1 shows the cross-tabulation between the numbers in the original classes and the classes detected by the different methods. First of all, LCGA seems to perform best by detecting all four trajectories almost perfectly (91% of the subjects were classified in the same trajectory as in the original classes). The most relevant solution in LCGMM revealed a three-class

Discussion

In the present article, five statistical methods (K-means clustering, a “two-step” approach involving a mixed model analysis and K-means clustering, LCA, LCGA, and LCGMM) were compared with each other to create subgroups of individuals with comparable developmental trajectories. This was done in two manipulated data sets and in a real-life data set.

In the real-life data set, the results regarding the classification of subjects in different developmental trajectories was comparable for the

Conclusion

In conclusion, based on both the real-life data set and the manipulated data sets, it is not clear which method to classify developmental trajectories in prospective epidemiological and medical studies is the best, although LCGA and LCGMM seem to be preferable above the more simple methods. However, all methods should be applied with great caution.

References (43)

  • B. Reboussin et al.

    Modeling adolescent drug-use patterns in cluster-unit trials with multiple sources of correlation using robust latent class regressions

    Ann Epidemiol

    (2006)
  • R. Ferdinand et al.

    Latent class analysis of anxiety and depressive symptoms in referred adolescents

    J Affect Disord

    (2005)
  • C. Conklin et al.

    The return to smoking: 1-year relapse trajectories among female smokers

    Nicotine Tob Res

    (1999)
  • S. Casswell et al.

    Trajectories of drinking from 18 to 26 years: identification and prediction

    Addiction

    (2002)
  • J. Schulenberg et al.

    Trajectories of marijuana use during the transition to adulthood: the big picture based on national panel data

    J Drug Issues

    (2005)
  • D. Deeg

    Longitudinal characterization of course types of functional limitations

    Disabil Rehabil

    (2005)
  • J. Liang et al.

    How does self-assessed health change with age? A study of older adults in Japan

    J Gerontol B Psychol Sci Soc Sci

    (2005)
  • J. Liang et al.

    Changes in functional status among older adults in Japan: successful and usual aging

    Psychol Aging

    (2003)
  • U. Alexy et al.

    Pattern of long-term fat intake and BMI during childhood and adolescence—results of the DONALD study

    Int J Obes Relat Metab Disord

    (2004)
  • A. Barrett et al.

    Trajectories of gender role orientations in adolescence and early adulthood: a prospective study of the mental health effects of masculinity and femininity

    J Health Soc Behav

    (2002)
  • C. Li et al.

    Developmental trajectories of overweight during childhood: role of early life factors

    Obesity

    (2007)
  • A. Ventura et al.

    Risk profiles for metabolic syndrome in a nonclinical sample of adolescent girls

    Pediatrics

    (2006)
  • K. Dunn et al.

    Characterizing the course of low back pain: a latent class analysis

    Am J Epidemiol

    (2006)
  • T. Croudace et al.

    Development typology of trajectories to nighttime bladder control: epidemiologic application of longitudinal latent class analysis

    Am J Epidemiol

    (2003)
  • T. Hoekstra et al.

    Developmental trajectories of body mass index throughout the life course: an application of latent class growth (mixture) modelling

    Longit Life Course Stud

    (2011)
  • S.J. Te Velde et al.

    Birth weight, adult body composition, and subcutaneous fat distribution

    Obes Res

    (2003)
  • I. Ferreira et al.

    The metabolic syndrome, cardiopulmonary fitness, and subcutaneous trunk fat as independent determinants of arterial stiffness: the Amsterdam Growth and Health Longitudinal Study

    Arch Intern Med

    (2005)
  • H. Goldstein

    Multilevel statistical models

    (2003)
  • B. Muthén

    Latent variable analysis: growth mixture modeling and related techniques for longitudinal data

  • Cited by (142)

    View all citing articles on Scopus

    Conflict of interest: None.

    View full text