Articles |
Hierarchical testing of variable importance
Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, U.K. meinshausen{at}stats.ox.ac.uk
Received for publication 1 August 2006.
Revision received 1 October 2007.
| Abstract |
|---|
A frequently encountered challenge in high-dimensional regression is the detection of relevant variables. Variable selection suffers from instability and the power to detect relevant variables is typically low if predictor variables are highly correlated. When taking the multiplicity of the testing problem into account, the power diminishes even further. To gain power and insight, it can be advantageous to look for influence not at the level of individual variables but rather at the level of clusters of highly correlated variables. We propose a hierarchical approach. Variable importance is first tested at the coarsest level, corresponding to the global null hypothesis. The method then tries to attribute any effect to smaller subclusters or even individual variables. The smallest possible clusters, which still exhibit a significant influence on the response variable, are retained. It is shown that the proposed testing procedure controls the familywise error rate at a prespecified level, simultaneously over all resolution levels. The method has power comparable to the Bonferroni–Holm procedure on the level of individual variables and dramatically larger power for coarser resolution levels. The best resolution level is selected adaptively.
Key Words: Hierarchical clustering High-dimensional Higher criticism alternative Multiple linear regression Multiple testing