Eat My Dust, Loops!

NumPy and Vectorization vs. List Comprehensions and For Loops

Occasionally when facilitating Cornell classes a student question why NumPy functions and vectorized code approaches are so much better in data science than list comprehensions and native for loops, and what the actual performance difference is in real life. Since I get the question so often, I decided I’d post a response here that I can refer folks back to for future reference.

I performed a series of simple data manipulation tests using four different approaches: numpy.where, vectorized Python, native for loops, and list comprehensions. You can see the entire experiment on my GitHub page.

On you mark… Get set… https://github.com/jasondeden/GCP-Jupyter/blob/main/SpeedOfOperations.ipynb

If you just want the details:

  • NumPy usually won the race, followed somewhat closely by vectorized Python code. This is because NumPy first converts and compiles Python into a lower-level language (like C or similar) and then performs the operation closer to the hardware, so to speak. However, this approach does come with a small up-front performance hit, and on really small data, vectorized code was actually faster than NumPy.
  • In distant third place was list comprehensions (about 60x slower, give or take, on my large data experiments compared to NumPy), which was about 20% faster than native for loops.
  • Interestingly, both looping approaches performed 50% faster when working off of list data vs. a NumPy array. Still orders of magnitude slower than the leaders, but was an unexpected discovery on my part.

Fun stuff!

--

--

--

Data Science & Cloud nerd with a passion for making complex topics easier to understand. All writings and associated errors are my own doing, not work-related.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Fixing Google Trends Data Limitations

Group Project on WNV

Exploratory Data Analysis of Titanic Survival Problem

Anomaly detection in podcasting

GA COVID-19 Report December 5, 2020

The sale of counterfeit goods is a multi-billion dollar industry that can have negative impacts on…

How AI is Changing the Video Game Industry: An Era of Augmentation and Synthetic Media

The industries take on the never-ending demand for Data Scientists?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jason Eden

Jason Eden

Data Science & Cloud nerd with a passion for making complex topics easier to understand. All writings and associated errors are my own doing, not work-related.

More from Medium

How to use pipeline in Python

Applying Graph Theory concepts in basic data manipulation problems

List Comprehension in Python

Python bits: dataclasses — Part 1