Latent class models for classification
Introduction
Let y denote a discrete dependent, outcome, target, or output variable, and a vector of independent, input, predictor, or attribute variables.1 Classification involves predicting the discrete outcome variable y as accurate as possible using the information on the z variables. Recently, latent class (LC), or finite mixture (FM), models have been proposed as classification tools in the field of neural networks (Jacobs et al., 1991; Bishop, 1995, pp. 212–220), as well as in the field of Bayesian (or belief) networks (Kontkanen et al., 1996; Monti and Cooper, 1999; Meilã and Jordan, 2000). This paper gives an overview of these developments and presents several extensions of the proposed models.
Classification using a statistical model involves specifying either a model for , as in regression analysis, or a model for , as in discriminant analysis. In the next two sections, we present two basic types of LC models for classification: they involve specifying a model for and , respectively. Subsequently, we illustrate the most important special cases of these two basic types with an empirical example. The paper ends with a short discussion.
Section snippets
Supervised classification structures
The first basic type of LC model for classification involves specifying a model for the conditional distribution of y given , where a discrete hidden variable x serves as intervening variable. More precisely, the assumed probability structure for iswhere is treated as fixed. Besides the above probability structure, regression-type constraints are imposed on the model probabilities. Since both the latent variable and the outcome variable are
Unsupervised classification structures
In the second basic type of LC model for classification, one models the conditional distribution of the z variables given y, . The decomposing of is nowSince the likelihood function used in the estimation is based on or , there is no direct relationship between model fit and classification performance. These methods belong, therefore, to the family of unsupervised classification or unsupervised learning methods. The predictive
An application
We applied the various LC models for classification to data of 9949 employees of a large national (American) corporation who where asked about their job satisfaction (see Table 5.10 in Agresti, 1990). The outcome variable (job satisfaction) has two levels: satisfied and not satisfied. The predictors are race, gender, age (three age groups) and regional location (seven regions). The data set was randomly split into a training and a validation sample, consisting of 5007 and 4942 cases,
Discussion
We described two basic types of LC models for classification. Advantages of the unsupervised methods are that their estimation is much faster, that they are less prone to local maxima, and that they can easily deal with missing data in the predictor variables. The most important advantage of the supervised methods is their better classification performance.
Among the unsupervised methods, the standard LC model (including factor variant) yields results that are most easy to interpret.
References (13)
Categorical Data Analysis
(1990)Neural Networks for Pattern Recognition
(1995)- et al.
Latent structure analysis of a set of multi-dimensional contingency tables
J. Amer. Statist. Assoc.
(1984) - et al.
Concomitant-variable latent-class models
J. Amer. Statist. Assoc.
(1988) Categorical Longitudinal Data—Loglinear Analysis of Panel, Trend and Cohort Data
(1990)- et al.
Adaptive mixtures of local experts
Neural Comput.
(1991)
Cited by (161)
Pain-Associated Psychological Distress Is of High Prevalence in Patients With Hip Pain: Characterizing Psychological Distress and Phenotypes
2024, Arthroscopy, Sports Medicine, and RehabilitationMultidimensional health heterogeneity of Chinese older adults and its determinants
2023, SSM - Population HealthEffect of feed composition on the production of off-gases during vitrification of simulated low-activity nuclear waste
2023, Progress in Nuclear EnergyA methodology for calculating the unmet passenger demand in the air transportation industry
2023, Research in Transportation Business and ManagementFinite mixture (or latent class) modeling in transportation: Trends, usage, potential, and future directions
2023, Transportation Research Part B: MethodologicalSubgroups of borderline personality disorder: A latent class analysis
2023, Psychiatry Research