Introduction and Overview of GraphQL

As research for one of my master’s program classes, I am exploring GraphQL. GraphQL is a next generation API interface designed to overcome the shortcomings of traditional RESTful APIs. To fully grasp the value of GraphQL then, one must first understand what traditional APIs do and how they work, their limitations and challenges, and the value obtained from addressing these. …

Eden J, Salas J, Santos Rutschman A, Prener CG, Niemotka S, Wiemken, TL. Associations of presidential voting preference and gubernatorial control with county-level COVID-19 case and death rates in the continental United States. Public Health. In Press.

Giving Structure to Unstructured Data

The vast majority of the machine learning most companies practice today is performed on tabular data — i.e. data that fits into nice columns and rows like a spreadsheet, where all the columns mean the same thing (follow a schema, whether it’s enforced programmatically or not), and so on. We refer to this as “structured” data. Conversely, the vast majority of data that exists today exists in the form of freeform documents, images, video, and other formats. We call this “unstructured” data. And while learning how to use structured data in machine learning has already…

There are 11 types of lies. (In binary…)

When I announced my blog project on LinkedIn, I did so with the following post:

I’m a man of my word.

I’ve since completed nine hours of coursework in my M.S. in Health Data Science Program, and have learned a significant number of things in the process. However, one point that has been driven home for me over and over again is that you cannot trust statistical models. Well, that’s not exactly right. It is more accurate to say “you should not blindly trust statistical models.” Early on in one of my first classes, I put together a…

Understanding Results vs. Predicting the Future

I recently completed a summer course as part of my Master’s in Health Data Science program that focused on Inferential Modeling. It was a really informative course that opened my eyes to a whole side of data science that I had never really been exposed to in my previous work. Since I assume there are readers in the same boat, I will try to explain it as I understand it today, where it fits, and some of the cool capabilities it enables.

In predictive analytics / machine learning, another term for “prediction” is “inference.”…

Invoking Intentional Inefficiency to Improve Inference

In a previous post I briefly touched on the problem with overfitting, which is loosely defined as a machine learning model that memorizes a training data set and thus provides high accuracy for predictions using it, but then performs poorly when presented with new data — a phenomenon known as variance. The post discussed the Random Forest approach using bootstrap aggregation to address this issue, but it begged the question: “Why does intentionally producing lower-quality data sets and averaging across their results produce better predictions?”

Reality, it turns out, is messy, so intentionally introducing…

Deep Neural Networks and Why They Rule the World (Mostly)

A significant portion of machine learning approaches are linear in nature — i.e. they take a look at training data observations and try to find the slope and intercept of the line that best fits that data. For example, let’s assume we have a training data set that has a shape something like this:

It’s like an ink-blot test, but for data nerds. What shapes to you see?

While there are some outlying observations, the general trend for this data is as the x-axis moves to the right, the y-axis goes up. Therefore, a linear model is an appropriate approach for inference. …

Finishing the Cornell Machine Learning Certificate

I recently completed the Cornell University Machine Learning Certificate program.

Pomp and Circumstance and All That Jazz

Coincidentally, at the same time I was completing the last course in this certificate program, at work I was also steeped in some intensive AI/ML training for Google Cloud SMEs. The cherry on top was I was (am) smack dab in the middle of my summer Inferential Modeling class as part of the Master’s program in Health Data Science. To say last week was a little… intense… would be an understatement. …

Building vs. Using Machine Learning Approaches

I’m about 30% through my summer course on Inferential Modeling, and in the latest set of lectures, I’m running into topics that I learned and wrote algorithms for as part of the Cornell Machine Learning Certificate program. The difference between the approaches is striking, and I’m absolutely loving it. In the Cornell program, you learn enough theory and mathematic principles to be able to code, from scratch, an entire machine learning algorithm, and over the course of two weeks, you might build a handful of functions that end up accomplishing a single task. …

Defining “Real” Data Science and AI/ML Expertise

What is a data scientist? I have been reading for years about the difficulty in defining exactly what qualifies one to become one. The broad strokes are you have to have appropriate degrees of statistical knowledge, coding skills, and domain expertise. Quantifying those areas in any meaningful way (i.e. a definition that can be generally agreed to) in the current data science landscape turns out to be next to impossible.

What I think of when I read a “real” data science post.

In some cases, a data scientist needs to have a Ph.D…

Jason Eden

Data Science, Big Data, & Cloud nerd with a focus on healthcare & a passion for making complex topics easier to understand. All thoughts are mine & mine alone.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store