Profiling Code
Contents
Profiling Code#
Think back to our example of finding the maximum difference between elements in a list.
def max_diff1(nums):
# Find the max difference between any pair of nunber
max_diff = 0
for n1 in nums:
for n2 in nums:
diff = abs(n1 - n2)
if diff > max_diff:
max_diff = diff
return max_diff
def max_diff2(nums):
# Realize the max difference is the same as the max
# number minus the min number
min_num = nums[0]
max_num = nums[0]
for num in nums:
if num < min_num:
min_num = num
elif num > max_num:
max_num = num
return max_num - min_num
def max_diff3(nums):
# Same as max_diff2 but uses Python built-ins
return max(nums) - min(nums)
Even though we saw that max_diff2
and max_diff3
have the same Big-O runtime, which one do we think is faster? It turns out max_diff3
is MUCH faster in practice.
Profiler#
Note
Note: The default 163 environment does not have the following tool installed, so we describe its output rather than actually running it for you! If you are interested in trying this out, you can install it from your terminal using
conda install line_profiler
on your computer if you are using Mac/Linux. For Windows, you will likely need to use the Anaconda Navigator to install it.
An easier way to identify the relative runtime of programs is to use something called a profiler. A profiler is a program that helps you analyze the runtime of the programs you write. Profilers are beneficial because they can give you more detailed information with relatively little code. Particularly with profiling, we rarely care about the raw times themselves, but rather, the times relative to other functions.
A common profiler is the line_profiler
package (also called kernprof
). kernprof
is nice because it lets you annotate your functions with an @profile
tag to have it profile your method. For example, the following snippet shows a file called test.py
that defines these three functions with an @profile
annotation above each function.
# File: test.py
@profile
def max_diff1(nums):
"""
Returns the largest difference between any two numbers in the given list.
Assumes there is at least one number in the list.
"""
max_diff = 0
for n1 in nums:
for n2 in nums:
diff = abs(n1 - n2)
if diff > max_diff:
max_diff = diff
return max_diff
@profile
def max_diff2(nums):
"""
Returns the largest difference between any two numbers in the given list.
Assumes there is at least one number in the list.
"""
min_num = nums[0]
max_num = nums[0]
for num in nums:
if num < min_num:
min_num = num
elif num > max_num:
max_num = num
return abs(max_num - min_num)
@profile
def max_diff3(nums):
"""
Returns the largest difference between any two numbers in the given list.
Assumes there is at least one number in the list.
"""
return max(nums) - min(nums)
@profile
def main():
nums = list(range(5000))
print(max_diff1(nums))
print(max_diff2(nums))
print(max_diff3(nums))
if __name__ == '__main__':
main()
Now, we can run this program in the profiler, we use the command kernprof -v -l test.py
(assuming you have it installed) and get the following output.
Total time: 24.1163 s
File: test.py
Function: max_diff1 at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 @profile
6 def max_diff1(nums):
7 """
8 Returns the largest difference between any two numbers in the given list.
9
10 Assumes there is at least one number in the list.
11 """
12 1 1.0 1.0 0.0 max_diff = 0
13 5001 1585.0 0.3 0.0 for n1 in nums:
14 25005000 6858540.0 0.3 28.4 for n2 in nums:
15 25000000 9716627.0 0.4 40.3 diff = abs(n1 - n2)
16 25000000 7537822.0 0.3 31.3 if diff > max_diff:
17 4999 1736.0 0.3 0.0 max_diff = diff
18 1 1.0 1.0 0.0 return max_diff
Total time: 0.006862 s
File: test.py
Function: max_diff2 at line 21
Line # Hits Time Per Hit % Time Line Contents
==============================================================
21 @profile
22 def max_diff2(nums):
23 """
24 Returns the largest difference between any two numbers in the given list.
25
26 Assumes there is at least one number in the list.
27 """
28 1 1.0 1.0 0.0 min_num = nums[0]
29 1 0.0 0.0 0.0 max_num = nums[0]
30 5001 1664.0 0.3 24.2 for num in nums:
31 5000 1804.0 0.4 26.3 if num < min_num:
32 min_num = num
33 5000 1746.0 0.3 25.4 elif num > max_num:
34 4999 1643.0 0.3 23.9 max_num = num
35
36 1 4.0 4.0 0.1 return abs(max_num - min_num)
Total time: 0.000135 s
File: test.py
Function: max_diff3 at line 39
Line # Hits Time Per Hit % Time Line Contents
==============================================================
39 @profile
40 def max_diff3(nums):
41 """
42 Returns the largest difference between any two numbers in the given list.
43
44 Assumes there is at least one number in the list.
45 """
46 1 135.0 135.0 100.0 return max(nums) - min(nums)
Total time: 39.8858 s
File: test.py
Function: main at line 49
Line # Hits Time Per Hit % Time Line Contents
==============================================================
49 @profile
50 def main():
51 1 124.0 124.0 0.0 nums = list(range(5000))
52 1 39873513.0 39873513.0 100.0 print(max_diff1(nums))
53 1 12023.0 12023.0 0.0 print(max_diff2(nums))
54 1 146.0 146.0 0.0 print(max_diff3(nums))
This output shows both how long each function took to run (in the output above the function code) as well as how much time was spent on each line inside each function. You can see that even for n=5000
, max_diff1
took 20 seconds to run! While max_diff2
and max_diff3
look close in runtime in comparison, max_diff3
still runs about 50x faster than max_diff2
!
To understand why max_diff3
is the fastest, we need to discuss why Python is a very slow language by design.