Debugging Code Issues Using Print Statements and Simple Test Data

Jason Eden
9 min readOct 26, 2021

A Pragmatic Philosophical Approach

So you’ve written code in your Jupyter notebook and it’s not working. Happens to the best of us. The question is, what are you going to do about it? In this blog post, I am going to detail the approach and thinking set I take when troubleshooting my own code in a notebook environment. The principles should apply regardless of what programming language you are using, however I am going to demo using Python.

Helpful Python Print Statement Basics

If you’re following my advice, you’re going to be evaluating a lot of print statements. Therefore, you need to know how to make them informative. Simply printing out a huge list of variables is no good if you don’t know which variable is being printed. To increase the effectiveness of your print statements, use the following format:

print(“Descriptive Text Here {}”.format(variable))

Let’s say you have a variable x and you intended to set it’s value to 10. To confirm that this is the value that is being used, you would use the following print statement:

print(“My x value = {}”.format(x))

When you run the code, when this line is executed it would return:

My x value = 10

If you have a number of small variables and would like to print them all at one time, you can do so by adding more curly braces and separating formatted variable names with a comma. For example, say you also have a variable y and you intended to set its value to 5. To print both x and y values at the same time in a single print statement, you could use the following:

print(“My x value = {}, my y value = {}”.format(x,y))

When this line of code is executed, and assuming your x and y values had been configured as you expect, it will output the following.

My x value = 10, my y value = 5

You can also nest functions inside the format function to produce a generated value. For example, if you want to print the sum of x and y, one way to do it would be as follows:

print(“The sum of x and y = {}”.format(x+y))

Which, assuming all was working as expected, would result in the following output on execution of this line of code:

The sum of x and y = 15

To put it all together, let’s say we wanted to make sure our program logic was working as expected. We could use a single print statement to check all three pieces of data:

print(“x = {}, y = {}, and the sum = {}”.format(x, y, x+y))

This would result in the following if all is working as intended:

x = 10, y = 5, and the sum = 15

This basic approach will form the basis of all of the troubleshooting steps that follow in this post.

Understand Your Inputs and Outputs

It’s amazing to me the number of folks who will spend significant amounts of time experimenting with their code to try to find a problem without truly understanding what they *should* be seeing vs. what they are getting.

If you have code that isn’t working, start by printing out the variables and data that are being read in, which form the basis of all of the interactions that follow. If your data at any step, from the beginning of the code to the final return statement, isn’t right, your code just simply isn’t going to work.

This also means the inputs to your functions!

In addition to your variables, when you’re writing complex calculations that are failing, you should repeat the logic above on each chunk of the calculation. For example, if you’re nesting functions, pull out a nested function and print its value alone. Maybe you have a logic error and you’re expecting to feed in a matrix or dataframe, but instead you’ve generated blank output. That’s difficult to spot if you’re just looking at the code and thinking it through, but if you pull out and print the nested subsections, those kinds of errors become obvious.

Take the following code for example (assumes NumPy has already been imported as np):

The assert statement at the end is simply making sure that we get some non-null result for our dotprod variable. If you were to run this code, that assertion would fail. The question is, where did we go wrong?

While the answer may be obvious in this admittedly simplistic example, to demonstrate the process described above, I’m still going to walk through the troubleshooting steps as though it isn’t.

The first four lines of code are generating data, so I want to see what is being generated:

If I run the code at this point, I would see the following printed out:

This is already useful information. I can see that my x value is not changing after my if statement, which could be the problem (or part of it). Again, maybe it would be obvious what was wrong with just this amount of information, but we’re going to assume it’s not and go to the next line.

Before we print the results of our dotprod variable, we want to break down the inputs that it is taking in. You can do this in any order, so I’m going to start from the right and work my way left. My far right input is b[x:].T and the one on the other side of the @ sign is a[0:x]. I’m going to place these print statements **before** the dotprod variable is calculated to see what it is being fed as inputs for the matrix multiplication. Then, I’ll print the calculated dotprod value after the calculation.

This gives me the following additional printed output:

And voila! One of the inputs in our dotprod calculation is resulting in an empty list (equivalent to None) and when you take the dot product of any matrix times nothing, you’re going to get nothing as a result.

To troubleshoot this code, then, I generated seven informative print statements. The total output looked like this:

Working backwards, dotprod is calculating as an empty list because a[0:x] is an empty list. Why is this? Because x still equals 0 at the time this value is generated, so a[0:x] is the same as a[0:0] — which is equivalent to saying give me all values of a starting at index 0, up to *but not including* 0 (which, by definition, is nothing). Therefore, to fix, I can either adjust the value of x to something greater than 0 prior to this line of code, or if x **should** still equal 0 at this point, I need to put some correction in the way the first value for dotprod is made. Perhaps the first value should really be a[0:x+1]?

Understand *Where* Your Code is Breaking Down

One of the helpful things about the print statements above is that I was able to confirm that all of my code was executing. This can be useful if you’ve got, for example, a hard to spot typo that is causing the code to fail, or some criteria in a called function that’s not being met. The print statements, if written carefully and correctly, will tell you exactly how far the program got prior to reaching the failure point. Take the following code as an example:

Because this is a small amount of code, we can quickly scan and probably easily find the typo in my creation of the b variable. The error message we get back would be very explicit in this regard as well. However, let’s assume that we’re working with more complex code, and the error message is being generated by an embedded function and it’s not obvious what is causing it. Once again, helpful print statements can help you pinpoint exactly where the error is occurring. Let’s put the full set together as before:

Now when we run the code, we get the following output:

From this we can clearly see that the code worked all the way to the creation of the a variable, but the b variable did not generate correctly. We need to fix this line of code before the rest of the program can run.

Complex Inputs? Create Simpler Tests

There are other simple troubleshooting steps you can use as well that, in conjunction with print statements, can help you understand where a program might be going wrong. Say for example you creating a function where you are reading in a large-ish matrix (from a readability standpoint — say, 150 columns and 2,000 rows or random numbers) and your code isn’t working. One troubleshooting step might be to generate a smaller matrix (say, 15x20, all single digit integers between 1 and 4) and then run your function on that data to check the output steps. If you understand what your code logic is supposed to be, then you should understand what would happen at each step in the program with the simpler data, and printing these steps / output can help you identify where things are going wrong.

In Jupyter notebooks, you can easily create additional cells for testing. For complex projects, it’s not uncommon for me to have 30, 40, or more test cells for creating individual tests — generating simplified data, checking the output of my function and its print statements, running tests on built-in functions I don’t quite understand yet, and so forth.

Don’t Forget to Clean Up!

When you’ve finished troubleshooting your code, you should probably comment out your print statements by placing the “#” sign in front of them. That way, they won’t generate a bunch of output every time your program is run, however if you need to troubleshoot something happening downstream, you can always reactivate them by simply uncommenting and reinitializing the cell. The final code in my example would look like this (assuming all issues had been mitigated):

Summary

So there you have it. First, use print statements that are formatted such that they provide useful information about the value being printed. Second, make sure you understand your inputs at every step in your code, paying special attention to values that don’t line up with expectations. Third, for computed variables that take multiple, complex inputs, print the value of the complex inputs **prior to** running the computation to create the variable, and then the output of the computed variable **after** that line of code. And finally, when writing functions that are going to work with data too large for convenient review, generate smaller, easier to understand sample data and run your code against that, evaluating your print statements along the way to look for trouble spots.

Get good at this, and you’ll dramatically reduce the time it takes you to debug those hard-to-find, hard-to-understand issues.

--

--

Jason Eden

Data Science & Cloud nerd with a passion for making complex topics easier to understand. All writings and associated errors are my own doing, not work-related.