“Anything R can do, I can do better.” — Python
Look, I’m not going to get into a philosophical debate about which programming language “real” data scientists should and do use. There are great arguments for both R and Python, and that’s before you even get to the Java/Scala developers sitting in the peanut gallery being all judgmental and feeling better than everyone else.
I’m personally agnostic about it. The tool you use that can get the job done for you is the right tool, full stop. In my last post, I talked about how to extend the power of your data analytics to the cloud by connecting R to GCP, and specifically focused on BigQuery. Well, to provide equal love for my Python brothers, sisters, and non-binary types, I went ahead and did the same thing for Python running from a Jupyter notebook that is **not** running on GCP. (This would be easier to manage, but may not always be an option…) You have to do the same pre-setup on the GCP side, including service accounts, scopes, and API enablement, and then on your local system you’ll need to install the Google Cloud SDK and load some Python modules. It’s a little more involved than getting RStudio connected, but your reward for the extra effort is a more deeply embedded experience and access to a broader set of cloud tools.
Check out my notebook on how to get it set up and running here.
And then here I detail how to run some basic queries on public datasets.
I am personally just getting started with both R and Python in Google Cloud, so there may be a lot of interesting things I don’t yet know, and as I come across them I’ll make some notes here so that you can come learn along with me. More to come!