This page provides useful instructions on how to choose the appropriate ensemble method for your deep learning model.
Check Your Model¶
A good rule-of-thumb is to check the performance of your deep learning model first. Below are two important aspects that you should pay attention to:
What is the final performance of your model ?
Does your model suffer from the over-fitting problem ?
To check these two aspects, it is recommended to evaluate the performance of your model on the
test_loader after each training epoch.
Using Ensemble-PyTorch, you can pass your model to the
Voting with the argument
n_estimators set to
1. The behavior of the ensemble should be the same as a single model.
If the performance of your model is relatively good, for example, the testing accuracy of your LeNet-5 CNN model on MNIST is over 99%, the conclusion on the first point is that it is not likely that your model suffers from the under-fitting problem. You could skip the section Under-fit.
If the performance of your model is unsatisfactory, you can try out the
Gradient Boosting related ensemble methods.
Gradient Boosting focuses on reducing the bias term from the perspective of Bias-Variance Decomposition, it usually works well when your deep learning model is a weak learner on the dataset.
Below is the pros and cons on using
You can have a much higher improvements than using other ensemble methods
Relatively longer training time
Suffer from the over-fitting problem if the value of
Gradient Boosting in Ensemble-PyTorch supports the early stopping to alleviate the over-fitting. To use early stopping, you need to set the input argument
early_stopping_rounds when calling the
fit() function of
Gradient Boosting. In additional, using a small
shrinkage_rate when declaring the model also helps to alleviate the over-fitting problem.
Large Training Costs¶
Training an ensemble of large deep learning models could take prohibitively long time and easily run out of the memory. If you are suffering from large training costs when using Ensemble-PyTorch, the recommended ensemble method would be
Snapshot Ensemble. The training costs on
Snapshot Ensemble are approximately the same as that on training a single base estimator. Please refer to the related section in Introduction for details on
Snapshot Ensemble does not work well across all deep learning models. To reduce the costs on using other parallel ensemble methods (i.e.,
Adversarial Training), you can set
1, which disables the parallelization conducted internally.