R —Grab Census Data via API

I pinky promise not to make you watch “There’s an App for That” again.

Credit for this post goes to Dr. Timothy Wiemken, a professor in one of my Master’s degree program classes, for making me aware of the existence of tidycensus and giving me the basic code template and steps for interacting with US Census data via API.

BigQuery is probably going to get a lot of press in my writing. It’s just so powerful, easy to use, and the collections of publicly available data are both immense and intuitive. However, there are any number of scenarios where the data you need to work with won’t be readily available — perhaps it’s not public or free, or it’s niche enough that the Google folks haven’t gotten around to adding it to the public repository, or <insert your reason here.> In those scenarios, knowing how to interact with a web API is going to be a valuable skill to have. And as you might have guessed by the title…

…oh, wait. I promised — **pinky** promised — I wouldn’t. Ah well.

Your disappointment is palpable. I can feel it from here.

Later I might write about making generic API calls, but for this first time out of the gate, we’re going to leverage a popular API (US Census data) and the tidycensus R library to make our first experience as seamless as possible.

Check out the working example and code here.

Note: I’m using RMarkdown now instead of standard R scripts. This should make it easier to read and follow along in RStudio or your IDE of choice.

The example requests a single piece of information — total number of households — on a county by county basis. The base output returns a number of columns we don’t need, so I pare it down a bit and then relabel a couple of columns to make it easier to understand. The final output looks something like this:

Boring, but still cool. Kind of like watching grass grow in Alaska.

Using the county GEOID field (a.k.a. fips code in a number of other data tables) you can now join this data with other data and start doing whatever number crunching or machine learning or analysis / charting your heart desires. Joining this data with other data is what makes it powerful!

If Voltron were a query, it would be a… FULL OUTER JOIN! (??!??)

The big prerequisite here is the need to request an API access key and then reference it when pulling the data. I provided the link where you could do that in the file. Also note that I did not save this file in my RStudio home directory, which is probably why I had to specify the location of my .Renviron file.

Overall, a pretty interesting and powerful way to start interacting with APIs in R. We’ll look at a more generic API example later.

Cloud computing and data nerd who dreams of being a data scientist, probably because he's married to one and she's pretty cute.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store