# Perceptron: More Than Meets the i

Transforming Capabilities By Writing My Third Machine Learning Algorithm

There are so many Transformers® jokes made possible just by the name of this ML algorithm alone that I’m struggling to stay on task.

I’m really, really struggling.

I have an earworm that goes something like “Autobots wage their battle to destroy the evil forces of… the Perceptetrons.” For those of you who aren’t familiar with the 80’s version theme-song that, to me, still defines the franchise, here’s what I’m talking about:

OK, trying to focus now. Foooocuuuussssss…… Perceptron — not an evil robot race bent on world domination, but rather, one of the first (if not the absolute first) machine learning algorithms ever invented. The Perceptron is a linear classifier like Naive Bayes, so we are going to label something as either one thing or the other (spam vs. not-spam, for example.) However, unlike Naive Bayes, the predictions the Perceptron makes are based on actual data rather than assumed distributions of our data based on maybe just a few observations. Let’s take a look at how a really, really basic version of this works.

Assume you have a data set that contains two, two-dimensional data points: (4,6), which is labeled +1, and (-2,5), which is labeled as -1. We are looking for a two-dimensional hyperplane that divides these two points such that the dot product of both of them result in a number that has the right label — in other words, the dot product of our hyperplane by a positively labeled point results in a positive value, and the dot product of a negatively labeled point results in a negative value. You can start with a guess from anywhere, but by convention you start at a hyperplane with all dimensions as 0. So for our 2-dimensional data, we’ll start with (0,0).

We will iterate through our data set one point at a time and see whether the dot product gives us the right type of value. By definition, our first “guess” is going to mislabel all data points (since the dot product result will be 0, so neither positive nor negative) so right off the bat a correction will need to be made. In order to adjust the hyperplane, you’re going to add or subtract the value of the mislabeled point — adding positively labeled points, and subtracting negatively labeled points. Our first data point is(4,6), which has a positive label, and we add that to our vector which gives us (0+4),(0+6), or (4,6).

Now we take the adjusted vector and check it against the next data point we come across. The dot product of (4,6) and (-2,5) is 22, a positive number. We were looking for a negative number this time, so this has been mislabeled. Time to adjust the hyperplane again.

Since the mislabeled point was negative, we **subtract** it from our vector. The resulting math: (4-(-2)),(6–5) gives us a new hyperplane of (6,1).

Now we have reached the end of our two-value data set, but the Perceptron won’t stop until its final hyperplane produces the correct answer 100% of the time against every data point. Therefore, we need to go through our data set again and make sure all dot products of this hyperplane by every point give the correct label. Here’s how the math worked on the second iteration: