Tran, Kien

Fuzzy Clustering of Nominal Data using Finite Mixtures

School of Mathematics and Statistics, Victoria University of Wellington

Clustering techniques are often performed to reduce the dimension of very large datasets, whose direct analysis using techniques such as regression can be computationally infeasible.

The clustering of non-independent nominal variables in particular poses distinct difficulties due to its lack of a distance metric and a measure of correlation. This prevents the use of many techniques such as k-means (Lloyd 1993), hierarchial clustering (see, eg. Hastie et al., 2009), Copula-based methods ((Nelsen, 1999)) or some variations of Principal Component Analysis (see, eg. Chavent et al., 2011).

This paper proposes a clustering approach for this data type based on finite mixtures (McLachlan and Peel, 2000) , pairwise composite likelihood (review by Varin et al., 2011) and transformation of nominal levels. If applicable, this would provide a parsimonious, likelihood-based fuzzy clustering model suitable for statistical inference; as well as the potential for extension to more general mixed type data.

NZSAStudent This presentation is eligible for the NZSA Student Prize.