Total Prediction Error

There is a very delicate balancing act when machine learning algorithms try to predict things. On the one hand, we want our algorithm to model the training data very closely, otherwise we’ll miss relevant features and interesting trends. However, on the other hand we don’t want our model to fit too closely, and risk over-interpreting every outlier and irregularity.

  1. Any low complexity model- Will be prone to underfitting because of high bias and low variance
  2. Any high complexity model(Decision tress)- Will be prone to overfitting due to low bias and high variance

The best we can do is try to settle somewhere in the middle of the spectrum, where the purple pointer is.