Kernel Trick 2 — Radial Basis Function (RBF) and “Infinite” Flexibility

Jason Eden

--

I have previously written about the SVM Kernel Trick and described how the polynomial kernal allows for massive programmatic feature expansion without massive increase in compute for inner product distance calculations. However, there is another, even more powerful kernel available for SVM — the Radial Basis Function or RBF.

A couple of big differences:

  1. The RBF kernel uses Euclidean distance (typically) rather than inner product for distance calculations
  2. The RBF kernel generates a similarity score (a percentage value between 0% and 100%)
  3. The RBF kernel does not explicitly add features to data, but implicitly is equivalent to having infinite dimensions (which makes it an extremely powerful classifier)
  4. Despite this “infinite dimensionality” the RBF kernel is able to make really fast inference — making it both faster *and* more powerful than the polynomial kernel

Points 3 and 4 are especially bold claims. How can this be both infinitely flexible *and* highly performant? Let’s start with the math.

Generating Similarity Scores

Similarity of test point to data =
Exponent of (negative (euclidean distance squared) divided by (sigma squared))

https://colab.research.google.com/drive/1enYEfZJLYULQ-7qhgdoRQrCBAkfArl7Y#scrollTo=lnNWFBeFqrLv

Go the Distance!

When two points are identical (distance = 0) you get a similarity score of 1 (or 100%). As the distance grows, the similarity drops. The value for sigma determines the upper distance boundary before distances get evaluated to 0 — in Colab, with a sigma value of 1, you hit that boundary around 28 or so, however if you raise sigma, you are able to continue to differentiate between higher distance values (if needed).

So we can calculate a distance between training data and test points and simply pick the most similar point as our prediction. Infinite flexibility!

Avoiding High Compute Costs at Inference

But wait. We could do this with k-NN as well, and it came with a serious drawback — at inference, the calculation of distance between a test point and every training point is extremely computationally expensive. Wouldn’t we have the same problem with SVM?

Here’s where the genius part comes in: Once your SVM model is built, you don’t actually need to keep all of your training data and compare the test points to every training point. You just need the support vectors.

Support Vectors

In case you’ve forgotten, support vectors are the training data points that are used to determine the placement of the boundary for SVM. They are equally distant from the decision boundary.

Thus, when you are comparing similarities, if you simply look at the similarity scores of your test point against the support vectors in your training data, whichever is most similar = your prediction! You only need a relatively small number of distance calculations after the model has been built to do inference, making this *extremely* fast.

Summary

In summary, the RBF kernel for SVM is an incredibly powerful and performant modeling approach when facing extremely complex data. It has the same flexibility as k-NN, however at inference distances only need to be calculated from a small number of training data points, making it highly performant as well.

--

--

No responses yet