Hedderley, Duncan

A comparison of partial least squares and random forests for metabolomics analysis

Duncan Hedderley1, Tony McGhie1, Sarah Cordiner1, Mary Ann Lila2, and Carol L Cheatham3

1. New Zealand Institute of Plant and Food Research, Palmerston North

2. Plants for Human Health Institute, North Carolina State University, North Carolina

3. Nutrition Research Institute, University of North Carolina at Chapel Hill, North Carolina

Metabolomics is the untargeted measurement of metabolites in samples order to discover unanticipated relationships between treatment groups and their metabolites. A standard approach has been to use partial least squares (PLS) to discriminate treatment groups. The R mixOmics package includes ‘sparse’ procedures which identify the most important variables contributing to the linear combinations which are PLS dimensions. Alternatively, random forests (Breiman 2001) use a classification tree approach to find variables which best discriminate between groups. We compare the information these approaches provide using data from a study on the effect of supplementing older people’s diets with blueberries.

References

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1):5–32. https://doi.org/10.1023/A:1010933404324.