- Teb's Lab
- Posts
- Workbench: First Class Functions, Closures and Decorators
Workbench: First Class Functions, Closures and Decorators
A preview of our Intermediate Python curricula
The Workbench
I’m Tyler Elliot Bettilyon (Teb) and this is The Workbench: Our practical, hands-on edition. Our goal is to demonstrate concepts and ideas we cover in our classes, a sort of preview.
If you’re new to the Lab Report you can subscribe here.
If you like what you’re reading you’ll love one of our classes. Signup for an upcoming class, browse our course catalog for corporate trainings, or request a custom class consultation.
Intermediate Python — Decorators
All the code in this newsletter can be found on Github.
Today, we’re previewing a new (and highly requested) course in development: Intermediate Python. Today, we’re featuring material from the lesson on decorators, which are one of my absolute favorite bits of Python syntax.
Decorators allow programmers to specify actions that should take place before and after a function is executed. Common examples include logging the arguments and return values of a function, introducing caching behavior to a function, and pre-or-post processes input and output values.
For example, Python’s standard library provides a “least recently used” (lru) cache decorator:
from functools import lru_cache
def fib(n):
if n <= 1:
return 1
return fib(n-1) + fib(n-2)
@lru_cache()
def cached_fib(n):
if n <= 1:
return 1
return cached_fib(n-1) + cached_fib(n-2)
(Note for the pedantic: this code returns the “zero-indexed” Fibonacci sequence. fib(0) → 1, fib(1) → 1, fib(2) → 2, fib(3) → 3, and so on…)
Both of these functions use recursion to compute numbers from the Fibonacci sequence, which is very slow. What makes it slow is that it duplicates a lot of work. If you call fib(5)
, this will result in two calls to fib(4)
and fib(3)
. fib(4)
will then result in a call to fib(3)
. And so on.
You end up with a series of function calls that look like this:
A tree diagram example of all the recursive calls to fib when you call fib(5).
When you add the “lru_cache” decorator it creates a key-value store that maps parameter values to return values. The first time the function fib(3)
is called, it performs the recursive calculation and stores the result. The second time fib(3)
is called it just returns the stored value without recomputing it.
Caching like this is a straightforward tradeoff between speed and storage: You store previous answers so you don’t have to recompute them. It’s also a core component of “Dynamic Programming.”
In this case, the time savings are huge:
import timeit
fib_time = timeit.timeit('fib(30)', number=200, setup="from __main__ import fib")
cached_fib_time = timeit.timeit('cached_fib(30)', number=200, setup="from __main__ import cached_fib")
print(f'Uncached: {fib_time}\nCached: {cached_fib_time}')
Which prints…
Uncached: 37.7692330639984
Cached: 2.9400980565696955e-05
Calling fib(30)
200 times takes 37.7 seconds. Calling cached_fib(30)
200 times takes 0.0000294 seconds. That is an incredible performance gain from adding one line of code.
Caching is an excellent tool in any programmer’s repertoire, but this lesson is about decorators. Let’s demystify the magical line of code: @lru_cache
.
First Class Functions
Decorators are a fantastic example of “functional programming” and rely on Python’s support for “first class functions.”
When a language supports first class functions it means functions can be used just like any other piece of data. Specifically, they can be stored into variables, passed to other functions as parameters, and returned from functions. It also means the language supports the creation of “inner” functions.
Here’s an example of an “inner function” in Python being returned by another function:
def counter_factory():
a = 0
def counter():
nonlocal a
a += 1
return a
return counter
Here, counter
is an “inner function” defined inside of the function counter_factory
. When we call counter_factory
we receive as output the function counter
. Consider this code (with the print values as comments):
x = counter_factory()
print(type(x)) # <class> 'function'
print(x()) # 1
print(x()) # 2
print(x()) # 3
x
is a function. Calling x
increments that variable a
and returns the increased value. Each time we call counter_factory
we will receive a new copy of the counter function, each with its own separate instance of a
, for example:
first = counter_factory()
second = counter_factory()
print(first()) # 1
print(first()) # 2
print(first()) # 3
print(second()) # 1
print(second()) # 2
print(second()) # 3
We call this situation a “closure” because the variable a
is “closed over” by the function counter
. Because a
is used in counter
, the Python interpreter has to preserve a reference to it even after counter_factory
returns — which would otherwise free the variable a
to be garbage collected.
If you’re having trouble wrapping your head around this, I highly suggest stepping through this script in a debugger.
Decorators
Decorators are just Python functions that satisfy two properties:
They accept a function as input.
They return a function as output.
Let’s make a simple caching decorator:
def simple_cache(input_function):
cache = {}
def wrapper(n):
if n in cache: return cache[n]
r = input_function(n)
cache[n] = r
return r
return wrapper
wrapper
creates the caching behavior. Before the input_function
gets called, we check if n
is in the cache (and simply return the value if so). After input_function
is called we map n
to the return value r
in the cache, then return r
to the caller.
simple_cache
is a decorator. It takes a function as input and it defines the function wrapper
which it returns as output. When we “decorate” a function we’re really just calling the decorator with the function below it as input and replacing the “decorated” function with the decorator’s return value.
That is, these two constructions are basically equivalent.
# Using the decorator syntax
@simple_cache
def fib(n):
if n <= 1: return 1
return fib(n-1) + fib(n-2)
# What happens behind the scenes
def fib(n):
if n <= 1: return 1
return fib(n-1) + fib(n-2)
# Overwrite the reference to 'fib'
fib = simple_cache(fib)
In both cases fib
is a reference to the function we called wrapper
, which simple_cache
returns. Additionally, inner_function
is a reference to the “raw” version of fib
. Between the “closure” around the variable cache
and the code in wrapper
we get caching. A quick timing test proves our cache is working:
@simple_cache
def fib(n):
if n <= 1: return 1
return fib(n-1) + fib(n-2)
fib_time = timeit.timeit('fib(30)', number=200, setup="from __main__ import fib")
print(fib_time) # 3.398300032131374e-05
🤯
Again, if you feel confused, stepping through this script in a debugger can be very enlightening.
Best Practices
Our simple_cache
decorator violates some best practices, limiting its usefulness. First, it would only work with functions that take a single argument because wrapper
only accepts a single parameter. Second, it would break Python’s introspection systems, including functions like help(). Lack of proper introspection can make debugging more difficult. It also breaks helpful tools built into the interactive shell, many IDEs, and documentation tools.
Here’s a recipe for a better decorator that overcomes both of these weaknesses:
import functools
def proper_decorator(input_function):
@functools.wraps(input_function)
def wrapper(*args, **kwargs):
# Do stuff before the function call...
r = input_function(*args, **kwargs)
# Do stuff after the function call
return r
return wrapper
The built-in decorator used on the inner function (inception!) helps Python’s introspection system work with decorated functions. The ‘unpacking’ syntax (the asterisks) used on *args
and **kwargs
allows any function decorated with @proper_decorator
to accept any number of arguments or keyword arguments.
Let’s expand our simple_cache
to conform to these best practices:
def better_cache(input_function):
cache = {}
@functools.wraps(input_function)
def wrapper(*args, **kwargs):
cache_key = (args, tuple(sorted(kwargs.items())))
if cache_key in cache:
r = cache[cache_key]
print(f'HIT {cache_key} -> {r}')
else:
r = input_function(*args, **kwargs)
cache[cache_key] = r
print(f'MISS, adding {cache_key} -> {r}')
return r
return wrapper
This cache decorator is now fairly robust. Consider this example:
@better_cache
def tester(a, b, c=0, d=0):
return a + b + c + d
tester(1, 2) # MISS
tester(1, 2) # HIT
tester(1, 2, c=0) # MISS!! (see the writeup below)
tester(1, 2, c=0) # HIT
tester(1, 2, c=5) # MISS
tester(1, 2, c=5) # HIT
tester(1, 2, d=4, c=5) # MISS
tester(1, 2, c=5, d=4) # HIT (handles swapped order of c and d)
There are still a couple of limitations:
If any of the parameters passed into a decorated function are ‘unhashable types’ (such as
lists
anddictionaries
) then our decorator will throw an error.Explicitly sending a keyword argument using the default value will produce a different
cache_key
from when the keyword argument is unspecified and the default value is used.
Those aren’t such horrible limitations. The first could be solved, though it can be tricky since unhashable types are typically complex data structures. Most of the time, it won’t be worth the hassle.
I’m not convinced the second actually can be solved. I tried and failed. If you know how, please let me know! Regardless, duplicating a couple of values in the cache isn’t the end of the world.
Decorators That Accept Arguments
It’s also possible to create decorators that themselves accept arguments. For example, right now the dictionary we use as a cache can grow infinitely large. Suppose we wanted our cache to have a maximum size, we’d want to design a decorator that can be used like this:
@size_limited_cache(maxsize=20)
def tester(a, b, c=0, d=0):
return a + b + c + d
Unfortunately, this gets a bit more complicated because a decorator must be a function that just accepts a function as input; that’s just how Python is designed. So, if we want our decorator to accept arguments, the standard workaround is to call a function that returns a decorator. In the above example, size_limited_cache
is a function that accepts a maxsize
parameter and returns a decorator (i.e., a function that accepts and returns a function).
Hold onto your hats, here’s what it looks like:
import functools
def size_limited_cache(maxsize=20):
def decorator(input_function):
cache = {}
@functools.wraps(input_function)
def wrapper(*args, **kwargs):
cache_key = (args, tuple(sorted(kwargs.items())))
if cache_key in cache:
r = cache[cache_key]
print(f'HIT {cache_key} -> {r}')
else:
r = input_function(*args, **kwargs)
if len(cache) < maxsize:
cache[cache_key] = r
print(f'MISS, adding {cache_key} -> {r}')
else:
print("MISS, but cache at maxsize.")
return r
return wrapper
return decorator
Clear as mud? Try breaking it down one function at a time:
size_limited_cache
accepts one keyword parameter (maxsize
) and returns the function we called decorator
. maxsize
gets a unique value and is “closed over” every time size_limited_cache
is called, but maxsize
is only used inside of wrapper
.
decorator
accepts a function as input (we called it input_function
) and returns the function we called wrapper
. decorator
also initializes our cache
.
wrapper
actually calls input_function
, uses the cache
, and now it also uses maxsize
to limit the growth of the cache.
Now, consider the order of operations when we decorate a function with this new setup:
@size_limited_cache(maxsize=5)
def tester(a, b, c=0, d=0):
return a + b + c + d
On the first line size_limited_cache
is called. That creates and returns the function we called decorator
. This is also the moment in time that maxsize
is closed over — and therefore fixed to 5
— for the function tester
. Said another way: size_limited_cache(maxsize=5)
results in a specific instance of decorator
.
On the second line (the def
) the Python interpreter calls that specific instance of decorator
with tester
as the input_function
. The cache
is initialized, a specific instance of the function wrapper
is created and returned, and the Python interpreter replaces the raw version of tester
with the specific instance of wrapper
.
Finally, when we write code that calls tester
, the instance of wrapper
actually gets called, invoking our size-limited cache. Here’s a crude example:
for _ in range(3):
for i in range(6):
tester(i, i)
The print statements inside wrapper
result in the following:
MISS, adding ((0, 0), ()) -> 0
MISS, adding ((1, 1), ()) -> 2
MISS, adding ((2, 2), ()) -> 4
MISS, adding ((3, 3), ()) -> 6
MISS, adding ((4, 4), ()) -> 8
MISS, ((5, 5), ()) but cache at maxsize.
HIT ((0, 0), ()) -> 0
HIT ((1, 1), ()) -> 2
HIT ((2, 2), ()) -> 4
HIT ((3, 3), ()) -> 6
HIT ((4, 4), ()) -> 8
MISS, ((5, 5), ()) but cache at maxsize.
HIT ((0, 0), ()) -> 0
HIT ((1, 1), ()) -> 2
HIT ((2, 2), ()) -> 4
HIT ((3, 3), ()) -> 6
HIT ((4, 4), ()) -> 8
MISS, ((5, 5), ()) but cache at maxsize.
To fully mimic the builtin lru_cache
decorator we’d need to do more work to keep the cache keys in an ordered collection based on how recently they’ve been used, and appropriately evict old entries. I’ll leave that as an exercise for you, dear reader, but I’ll mention that the OrderdDict class from Python’s collections module is very useful.
Multiple Decorators
You can even decorate a function multiple times. Here are two simple decorators. The first logs the parameters and return value every time a function is called. The second causes the function to be called twice, and returns both return values combined into a tuple.
import functools
import logging
import sys
logging.basicConfig(level=logging.INFO, stream=sys.stdout)
def log(input_function):
functools.wraps(input_function)
def wrapper(*args, **kwargs):
logging.info(f'{input_function} called with args: {args} and kwargs: {kwargs}')
r = input_function(*args, **kwargs)
logging.info(f'{input_function} returned {r}')
return r
return wrapper
def repeat(input_function):
functools.wraps(input_function)
def wrapper(*args, **kwargs):
r1 = input_function(*args, **kwargs)
r2 = input_function(*args, **kwargs)
return r1, r2
return wrapper
If we decorate a function with repeat
then log
, the function will be called twice, and each individual call will be logged, but the final result will be a tuple; that tuple won’t appear anywhere in the logs but will appear in the final print statement:
@repeat
@log
def add(a, b):
return a + b
r = add(1, 2)
print(f"Final value for r: {r}")
Yields:
INFO:root:<function add at 0x10734f740> called with args: (1, 2) and kwargs: {}
INFO:root:<function add at 0x10734f740> returned 3
INFO:root:<function add at 0x10734f740> called with args: (1, 2) and kwargs: {}
INFO:root:<function add at 0x10734f740> returned 3
Final value for r: (3, 3)
Decorators are applied from bottom to top. So, when we execute the def
in this code, the following happens in this order:
The raw function
add
is created.That
add
function is sent to thelog
decorator asinput_function
, which returns an instance ofwrapper
, specifically, the one that adds logging information.That new logged function is passed into the
repeat
decorator asinput_function
;repeat
returns an instance of itswrapper
.The Python interpreter replaces references to our raw
add
function with thewrapper
returned fromrepeat
in step 3.
We can see that both of the individual calls to the wrapper
created in step 2 get logged separately.
Now consider what happens when we switch the order of the decorators:
@log
@repeat
def add(a, b):
return a + b
r = add(1, 2)
print(f"Final value for r: {r}")
INFO:root:<function repeat.<locals>.wrapper at 0x10ef177e0> called with args: (1, 2) and kwargs: {}
INFO:root:<function repeat.<locals>.wrapper at 0x10ef177e0> returned (3, 3)
Final value for r: (3, 3)
Instead of logging each individual function call the wrapper
of the repeat
decorator is passed as input_function
to the log
decorator. That’s why we only get one set of log lines instead of two sets. Additionally, the final return value matches the value that gets logged (i.e., the tuple (3,3)
).
Challenge yourself and put it all together:
Consider the following code — which uses 3 of the decorators we built — and try to decide what gets logged and what key-value pair gets saved in the cache.
@size_limited_cache(maxsize=5)
@repeat
@log
def add(a, b):
return a + b
add(1,2)
add(1,2)
Other Popular Use Cases and Examples
Flask (and other webserver frameworks) use decorators to register routes, require users to log in, cache page content, and more.
The builtin @property, @staticmethod, @classmethod decorators make some OOP patterns easier to implement in Python.
Some libraries for asynchronous code use decorators to manage locks and thread synchronization.
Remember…
The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.
Help us keep providing these free services by scheduling one of our world class trainings, requesting a custom class for your team, or taking one of our open enrollment classes.
Reply