Transforming Capabilities By Writing My Third Machine Learning Algorithm

There are so many Transformers® jokes made possible just by the name of this ML algorithm alone that I’m struggling to stay on task.

I’m really, really struggling.

I have an earworm that goes something like “Autobots wage their battle to destroy the evil forces of… the Perceptetrons.” For those of you who aren’t familiar with the 80’s version theme-song that, to me, still defines the franchise, here’s what I’m talking about:

ROBOTS IN DISGUISE!

OK, trying to focus now. Foooocuuuussssss…… Perceptron — not an evil robot race bent on world…


The Likelihood of Tails vs. “Not Heads”

When discussing probability, a commonly used example is a coin flip. It’s a 50/50 proposition, presumptively, that you will observe either heads or tails on any given flip. If you assign labels like h and t to heads and tails, you would write that out as the probability of heads = 50%, or P(h) = .5. The presumptive probability of tails would also equal 50%, or P(t) = .5 as well.

That part is pretty straightforward, in theory. In reality, very little is actually that straightforward. For example, even a simple coin flip…


Log Math Basics for Non-Math Majors

There’s a lot of math I am having to catch back up with and on as I dive deeper into data science, but one of the more rewarding bits of this journey has been in finally understanding the point of taking the log value of a number. I’ve seen the approach used multiple times in statistics courses and in data science programming, but if I’m honest, I never really understood why it was necessary or how it benefitted the process. …


Indexing / Slicing Operations in R

Another quick post: In my previous post I demonstrated a technique in Python programming that I have been using as part of my assignments for the Cornell University Machine Learning Certificate program. This got me to thinking: Can you do the same thing in R — i.e. subset a matrix based on a separate list of labels, and then perform operations on the subsets to populate another array of outputs?

Yes, yes you can, as I demonstrate here. It looks and feels a little different, but you can functionally accomplish the same thing. I…


A Simple, Straightforward Example of a Powerful Concept

Just a quick write-up (you’ll be relieved to learn there are no jokes in this one. Except this one.) In the Cornell data science courses I have taken so far, they have heavily emphasized the need to use indexing and slicing instead of loops, especially when working with large data sets. The reason is speed: doing an iterative loop to perform some kind of matrix math is dramatically slower in python than the alternative. …


Writing My Second Machine Learning Algorithm Using Naive Bayes

Making assumptions has a bad reputation. You know what assuming does, right? (For the record, I narrowly decided against making the popular answer to that question the title of this blog post…) A significant amount of energy has been poured into self-help books and videos to try to show us how making assumptions is detrimental to ourselves and to society in general. I could on a different blog channel go on and on about how profiling (which I’m defining here as making assumptions about persons based on race, gender, appearance, etc…


Or… Why Good Data Engineers Will Always Have Work

Prior to publishing the mid-term exam, my professor in my Programming for Health Data Scientists class gave us all an option to do some extra work on a pre-midterm assignment. The payoff, assuming you did well enough, would be to avoid one of the problems on the mid-term the following week. The challenge assignment: take two files — one small-ish JSON file and a much more robust XML file pulled from two hospitals — and somehow combine their data in such a way that you could do something useful with it…


Writing My First Machine Learning Algorithm from Scratch

(Note: My next posts on both R and Python for Machine Learning are going to entail work I’m doing for my final project for both of my Master’s Degree classes. As such, I’m going to delay publishing the results on my blog until after the classes are over.)

There’s a relatively well-known scene in Monty Python and the Holy Grail where the group of knights comes across a rabbit’s lair. The cave is surrounded by bones, and their guide warns them that the “cute little bunny” is actually a sadistic killer. The…


A nerdy Clara Peller would have enjoyed this one.

Up to this point, a lot of my effort has been in figuring out how to do certain things that are pretty much baseline capabilities for integrating the power of Google Cloud Bigquery and AI/ML tools into data analysis / data science work on local systems running R or Python. We’ve reached a point, now, where we can start to dig in and actually do something resembling a real project, or at least the beginnings of one. So if you’ve been wondering

HotNrd is more than a big, fluffy bun.

…your…


I pinky promise not to make you watch “There’s an App for That” again.

Credit for this post goes to Dr. Timothy Wiemken, a professor in one of my Master’s degree program classes, for making me aware of the existence of tidycensus and giving me the basic code template and steps for interacting with US Census data via API.

BigQuery is probably going to get a lot of press in my writing. It’s just so powerful, easy to use, and the collections of publicly available data are both immense and intuitive. However, there are any number of scenarios where the…

Jason Eden

Cloud computing and data nerd who dreams of being a data scientist, probably because he's married to one and she's pretty cute.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store