A Quick Look at Making the Theory Real

In my previous blog post I explained the logic behind the offset trick, where you incorporate a b value into your data set and weight vector for the Perceptron, and walked through a theoretical example. In this blog post, I will demonstrate the actual code style I would use if I were doing that assignment again and wanted to incorporate the b value rather than track it separately through the iterations. It would look something like this:

Making the theory real.

Note that usually the first part — stacking the data points into a single data…


Perceiving the Perceptron’s Programming Problem

(Note: if you’re not currently in the Cornell Machine Learning Certificate program or something similar, or here to heckle me from the peanut gallery, this one is likely not for you.)

One of the advantages of being married to a data scientist (for me — for her, it’s probably pretty annoying sometimes) is when I run across a particular hard to understand concept in the Cornell Machine Learning Certificate program, she usually can explain it to me in a way that clicks when no one else can. This particular challenge was in the first exercise…


Descent into Madness using Logistic Regression (My 4th Machine Learning Algorithm)

In the Linear Classifiers course in the Cornell Machine Learning Certificate program, you end up implementing two classification machine learning algorithms: the Perceptron (which I discussed in my previous blog) and Logistic Regression.

Logistic Regression turns out to be a confusing name, or it was for me when I first heard it, because typically a “regression” algorithm for machine learning is **not** a classification (0 or 1), but rather an algorithm designed to predict a specific value — a commonly used example being the sales price of a home…


Transforming Capabilities By Writing My Third Machine Learning Algorithm

There are so many Transformers® jokes made possible just by the name of this ML algorithm alone that I’m struggling to stay on task.

I’m really, really struggling.

I have an earworm that goes something like “Autobots wage their battle to destroy the evil forces of… the Perceptetrons.” For those of you who aren’t familiar with the 80’s version theme-song that, to me, still defines the franchise, here’s what I’m talking about:

ROBOTS IN DISGUISE!

OK, trying to focus now. Foooocuuuussssss…… Perceptron — not an evil robot race bent on world…


The Likelihood of Tails vs. “Not Heads”

When discussing probability, a commonly used example is a coin flip. It’s a 50/50 proposition, presumptively, that you will observe either heads or tails on any given flip. If you assign labels like h and t to heads and tails, you would write that out as the probability of heads = 50%, or P(h) = .5. The presumptive probability of tails would also equal 50%, or P(t) = .5 as well.

That part is pretty straightforward, in theory. In reality, very little is actually that straightforward. For example, even a simple coin flip…


Log Math Basics for Non-Math Majors

There’s a lot of math I am having to catch back up with and on as I dive deeper into data science, but one of the more rewarding bits of this journey has been in finally understanding the point of taking the log value of a number. I’ve seen the approach used multiple times in statistics courses and in data science programming, but if I’m honest, I never really understood why it was necessary or how it benefitted the process. …


Indexing / Slicing Operations in R

Another quick post: In my previous post I demonstrated a technique in Python programming that I have been using as part of my assignments for the Cornell University Machine Learning Certificate program. This got me to thinking: Can you do the same thing in R — i.e. subset a matrix based on a separate list of labels, and then perform operations on the subsets to populate another array of outputs?

Yes, yes you can, as I demonstrate here. It looks and feels a little different, but you can functionally accomplish the same thing. I…


A Simple, Straightforward Example of a Powerful Concept

Just a quick write-up (you’ll be relieved to learn there are no jokes in this one. Except this one.) In the Cornell data science courses I have taken so far, they have heavily emphasized the need to use indexing and slicing instead of loops, especially when working with large data sets. The reason is speed: doing an iterative loop to perform some kind of matrix math is dramatically slower in python than the alternative. …


Writing My Second Machine Learning Algorithm Using Naive Bayes

Making assumptions has a bad reputation. You know what assuming does, right? (For the record, I narrowly decided against making the popular answer to that question the title of this blog post…) A significant amount of energy has been poured into self-help books and videos to try to show us how making assumptions is detrimental to ourselves and to society in general. I could on a different blog channel go on and on about how profiling (which I’m defining here as making assumptions about persons based on race, gender, appearance, etc…


Or… Why Good Data Engineers Will Always Have Work

Prior to publishing the mid-term exam, my professor in my Programming for Health Data Scientists class gave us all an option to do some extra work on a pre-midterm assignment. The payoff, assuming you did well enough, would be to avoid one of the problems on the mid-term the following week. The challenge assignment: take two files — one small-ish JSON file and a much more robust XML file pulled from two hospitals — and somehow combine their data in such a way that you could do something useful with it…

Jason Eden

Cloud computing and data nerd who dreams of being a data scientist, probably because he's married to one and she's pretty cute.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store