Regularized Linear Models in Stacked Generalization
by Sam Reid and Greg Grudic
Appearing in Multiple Classifier Systems 2009
published by Springer in the Lecture Notes in Computer Science (LNCS) series
Archived PDF
Abstract.
Stacked generalization is
a flexible method for multiple classifier combination; however, it tends to
overfit unless the combiner function is sufficiently smooth. Previous studies
attempt to avoid overfitting by using a linear function at the combiner level.
This paper demonstrates experimentally that even with a linear combination
function, regularization is necessary to reduce overfitting and increase predictive
accuracy. The standard linear least
squares regression can be regularized with an L2 penalty (Ridge regression), an
L1 penalty (lasso regression) or a combination of the two (elastic net
regression). In multi-class classification, sparse linear models select and
combine individual predicted probabilities instead of using complete
probability distributions, allowing base classifiers to specialize in
subproblems corresponding to different classes.