Building vs. Using Machine Learning Approaches

I’m about 30% through my summer course on Inferential Modeling, and in the latest set of lectures, I’m running into topics that I learned and wrote algorithms for as part of the Cornell Machine Learning Certificate program. The difference between the approaches is striking, and I’m absolutely loving it. In the Cornell program, you learn enough theory and mathematic principles to be able to code, from scratch, an entire machine learning algorithm, and over the course of two weeks, you might build a handful of functions that end up accomplishing a single task. …


Defining “Real” Data Science and AI/ML Expertise

What is a data scientist? I have been reading for years about the difficulty in defining exactly what qualifies one to become one. The broad strokes are you have to have appropriate degrees of statistical knowledge, coding skills, and domain expertise. Quantifying those areas in any meaningful way (i.e. a definition that can be generally agreed to) in the current data science landscape turns out to be next to impossible.

In some cases, a data scientist needs to have a Ph.D…


A Whirlwind Tour of the Past Few Weeks of AI/ML Learning

Things are getting busy — end of spring semester, beginning of summer semester in my M.S. Health Data Science Program, and since my last Cornell-specific update, I’ve finished two more classes and just have one to go which starts next week. That, plus some exciting related news at work (more later, perhaps) has meant a lot of time learning and not as much time available for writing — a quality problem to have, indeed! …


Why Good Data Engineers Will Always Have Work, Part Deux

Over the summer I’m taking an Inferential Modeling course using R. Our first assignment was to do some basic regression modeling on a public data set (link). I thought I would be all clever and code my R script to ingest the file as a first step, rather than download the static file and read it in locally — teacher’s pet and all, showing off my mad R skills.

I should have known…

If you download the file today, you’ll note that the column names are all nice and…


Health Data Science Statistics and Analytical Programming Final Project

For my Statistics and Analytical Programming final project, the goals were similar to my Python course final, but with different caveats and requirements. For starters, there were no requirements about reading in data from a certain number of different file formats. The only requirement on the data side was finding an interesting data set and doing some analysis on it. Therefore, I was able to take the work I had started here using public data on BigQuery, export it to GitHub, and just start chunking away at it. On the flip…


Health Data Science Python Programming for Data Scientists Final Project

For my final project in my recently completed Health Data Science Master’s degree course, our instructions were to find data from a variety of sources, in a variety of formats, and do something interesting with it that could relate to a broader healthcare initiative in the real world. We needed to demonstrate the ability to read in and work with multiple types of files, transform and manipulate data, produce basic visualizations, and generally demonstrate some level of mastery over the variety of course concepts that were covered throughout the semester…


Implementing Decision Trees, My 5th Machine Learning Algorithm

Up to this point in the Cornell Machine Learning Certificate program, the algorithms have been either classification (put a data point into a category — +1/-1, yes/no, etc.) or regression (predict a data point’s numeric value given the features.) Decision Trees are the first machine learning algorithm we’ve discussed where the same type of algorithm can be used for either purpose, which makes them flexible and powerful, as long as you know how to use them.

In this write-up, I will go over the basics of what a Decision Tree…


A Quick Look at Making the Theory Real

In my previous blog post I explained the logic behind the offset trick, where you incorporate a b value into your data set and weight vector for the Perceptron, and walked through a theoretical example. In this blog post, I will demonstrate the actual code style I would use if I were doing that assignment again and wanted to incorporate the b value rather than track it separately through the iterations. It would look something like this:

Note that usually the first part — stacking the data points into a single data…


Perceiving the Perceptron’s Programming Problem

(Note: if you’re not currently in the Cornell Machine Learning Certificate program or something similar, or here to heckle me from the peanut gallery, this one is likely not for you.)

One of the advantages of being married to a data scientist (for me — for her, it’s probably pretty annoying sometimes) is when I run across a particular hard to understand concept in the Cornell Machine Learning Certificate program, she usually can explain it to me in a way that clicks when no one else can. This particular challenge was in the first exercise…


Descent into Madness using Logistic Regression (My 4th Machine Learning Algorithm)

In the Linear Classifiers course in the Cornell Machine Learning Certificate program, you end up implementing two classification machine learning algorithms: the Perceptron (which I discussed in my previous blog) and Logistic Regression.

Logistic Regression turns out to be a confusing name, or it was for me when I first heard it, because typically a “regression” algorithm for machine learning is **not** a classification (0 or 1), but rather an algorithm designed to predict a specific value — a commonly used example being the sales price of a home…

Jason Eden

Data Science, Cloud Computing, and Big Data nerd with a focus on healthcare and a deep-rooted passion for making complex topics easier to understand.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store