Eat My Dust, Loops!

Jason Eden
2 min readMar 26, 2022

NumPy and Vectorization vs. List Comprehensions and For Loops

Occasionally when facilitating Cornell classes a student question why NumPy functions and vectorized code approaches are so much better in data science than list comprehensions and native for loops, and what the actual performance difference is in real life. Since I get the question so often, I decided I’d post a response here that I can refer folks back to for future reference.

I performed a series of simple data manipulation tests using four different approaches: numpy.where, vectorized Python, native for loops, and list comprehensions. You can see the entire experiment on my GitHub page.

On you mark… Get set… https://github.com/jasondeden/GCP-Jupyter/blob/main/SpeedOfOperations.ipynb

If you just want the details:

  • NumPy usually won the race, followed somewhat closely by vectorized Python code. This is because NumPy first converts and compiles Python into a lower-level language (like C or similar) and then performs the operation closer to the hardware, so to speak. However, this approach does come with a small up-front performance hit, and on really small data, vectorized code was actually faster than NumPy.
  • In distant third place was list comprehensions (about 60x slower, give or take, on my large data experiments compared to NumPy), which was about 20% faster than native for loops.
  • Interestingly, both looping approaches performed 50% faster when working off of list data vs. a NumPy array. Still orders of magnitude slower than the leaders, but was an unexpected discovery on my part.

Fun stuff!

--

--

Jason Eden

Data Science & Cloud nerd with a passion for making complex topics easier to understand. All writings and associated errors are my own doing, not work-related.