A palate cleanser for when you get bored of all of that “Big Data.”
BigQuery is awesome, and would be the data platform I would use as often as possible from a sheer power, ease of use, and flexibility standpoint. What do you do, though, if you’re working in your Jupyter notebook and have a chunk of data where the size and fleeting use case don’t warrant even the tiny effort involved to pull it into your BigQuery data set? Sometimes you’re just going to download and read in a local CSV file, and that’s OK. Other times you’ll find data on the web that you just want to quickly pull in. If that data on the web happens to be a Google Sheet, all nicely formatted for you already, even better. And as you might have guessed by the title of this post:
The Google Sheets API docs deal with a significant number of scenarios that require things like OAuth authentication, reading, writing, editing, appending, and so on. For our purposes, we’re going to assume the data we want is read-only, publicly available (thus no user credentials required), meaning all we need is code to read some data, store, and print results.
First, we’ll need to know the spreadsheet ID. When you go to a Google sheet in a web browser, it’s the string of characters following “/d/” and before the next “/” — usually followed by a word like edit. An example looks like this (starting at “1D1n” and ending at “a_MQ”):
It will help if you know how many rows and columns are in the data. For the example in my notebook, it looks something like this:
The only difference is the numbers have been formatted to take out the comma. The only other added step — which is noted in the notebook as well — is that you have to enable the Sheets API in your Google Cloud project in order to make the API calls. Once you’re set up and you know what it is you want to pull into your notebook (say for example, the first 10 rows and all four columns), you can follow the example I’ve posted here.
Once you’ve got the data in place and formatted the way you want it, you can use your Python data munging skills to make it work with other data or just play around with it as-is.
Personally, I think this would be a neat way to standardize on sharing out smaller public datasets, and would open up access to that data for folks who weren’t quite as technically savvy (I imagine most folks can click a link and open a Google Sheet, even if they would struggle to download and open a CSV file…) But regardless, as a way to share out your own data where applicable, it’s worth considering. And if you do, now we can get to it with Python as well. Everybody wins!