• Teb's Lab
  • Posts
  • Workbench: First Class Functions, Closures and Decorators

Workbench: First Class Functions, Closures and Decorators

A preview of our Intermediate Python curricula

The Workbench

I’m Tyler Elliot Bettilyon (Teb) and this is The Workbench: Our practical, hands-on edition. Our goal is to demonstrate concepts and ideas we cover in our classes, a sort of preview.

If you’re new to the Lab Report you can subscribe here.

If you like what you’re reading you’ll love one of our classes. Signup for an upcoming class, browse our course catalog for corporate trainings, or request a custom class consultation.

Intermediate Python — Decorators

All the code in this newsletter can be found on Github.

Today, we’re previewing a new (and highly requested) course in development: Intermediate Python. Today, we’re featuring material from the lesson on decorators, which are one of my absolute favorite bits of Python syntax.

Decorators allow programmers to specify actions that should take place before and after a function is executed. Common examples include logging the arguments and return values of a function, introducing caching behavior to a function, and pre-or-post processes input and output values.

For example, Python’s standard library provides a “least recently used” (lru) cache decorator:

from functools import lru_cache


def fib(n):
    if n <= 1:
        return 1
    return fib(n-1) + fib(n-2)


@lru_cache()
def cached_fib(n):
    if n <= 1:
        return 1
    return cached_fib(n-1) + cached_fib(n-2)

(Note for the pedantic: this code returns the “zero-indexed” Fibonacci sequence. fib(0) → 1, fib(1) → 1, fib(2) → 2, fib(3) → 3, and so on…)

Both of these functions use recursion to compute numbers from the Fibonacci sequence, which is very slow. What makes it slow is that it duplicates a lot of work. If you call fib(5), this will result in two calls to fib(4) and fib(3). fib(4) will then result in a call to fib(3). And so on.

You end up with a series of function calls that look like this:

A tree diagram example of all the recursive calls to fib when you call fib(5).

When you add the “lru_cache” decorator it creates a key-value store that maps parameter values to return values. The first time the function fib(3) is called, it performs the recursive calculation and stores the result. The second time fib(3) is called it just returns the stored value without recomputing it.

Caching like this is a straightforward tradeoff between speed and storage: You store previous answers so you don’t have to recompute them. It’s also a core component of “Dynamic Programming.”

In this case, the time savings are huge:

import timeit

fib_time = timeit.timeit('fib(30)', number=200, setup="from __main__ import fib")
cached_fib_time = timeit.timeit('cached_fib(30)', number=200, setup="from __main__ import cached_fib")

print(f'Uncached: {fib_time}\nCached: {cached_fib_time}')

Which prints…

Uncached: 37.7692330639984
Cached: 2.9400980565696955e-05

Calling fib(30) 200 times takes 37.7 seconds. Calling cached_fib(30) 200 times takes 0.0000294 seconds. That is an incredible performance gain from adding one line of code.

Caching is an excellent tool in any programmer’s repertoire, but this lesson is about decorators. Let’s demystify the magical line of code: @lru_cache.

First Class Functions

Decorators are a fantastic example of “functional programming” and rely on Python’s support for “first class functions.”

When a language supports first class functions it means functions can be used just like any other piece of data. Specifically, they can be stored into variables, passed to other functions as parameters, and returned from functions. It also means the language supports the creation of “inner” functions.

Here’s an example of an “inner function” in Python being returned by another function:

def counter_factory():
    a = 0

    def counter(): 
        nonlocal a
        a += 1
        return a
    
    return counter

Here, counter is an “inner function” defined inside of the function counter_factory. When we call counter_factory we receive as output the function counter. Consider this code (with the print values as comments):

x = counter_factory()
print(type(x)) # <class> 'function'
print(x()) # 1
print(x()) # 2
print(x()) # 3

x is a function. Calling x increments that variable a and returns the increased value. Each time we call counter_factory we will receive a new copy of the counter function, each with its own separate instance of a, for example:

first = counter_factory() 
second = counter_factory()

print(first()) # 1
print(first()) # 2
print(first()) # 3

print(second()) # 1
print(second()) # 2
print(second()) # 3

We call this situation a “closure” because the variable a is “closed over” by the function counter. Because a is used in counter, the Python interpreter has to preserve a reference to it even after counter_factory returns — which would otherwise free the variable a to be garbage collected.

If you’re having trouble wrapping your head around this, I highly suggest stepping through this script in a debugger.

Decorators

Decorators are just Python functions that satisfy two properties:

  1. They accept a function as input.

  2. They return a function as output.

Let’s make a simple caching decorator:

def simple_cache(input_function):
    cache = {}

    def wrapper(n):
        if n in cache: return cache[n]

        r = input_function(n)
        cache[n] = r
        return r
    
    return wrapper

wrapper creates the caching behavior. Before the input_function gets called, we check if n is in the cache (and simply return the value if so). After input_function is called we map n to the return value r in the cache, then return r to the caller.

simple_cache is a decorator. It takes a function as input and it defines the function wrapper which it returns as output. When we “decorate” a function we’re really just calling the decorator with the function below it as input and replacing the “decorated” function with the decorator’s return value.

That is, these two constructions are basically equivalent.

# Using the decorator syntax
@simple_cache
def fib(n):
    if n <= 1: return 1
    return fib(n-1) + fib(n-2)


# What happens behind the scenes
def fib(n):
    if n <= 1: return 1
    return fib(n-1) + fib(n-2)

# Overwrite the reference to 'fib'
fib = simple_cache(fib)

In both cases fib is a reference to the function we called wrapper, which simple_cache returns. Additionally, inner_function is a reference to the “raw” version of fib. Between the “closure” around the variable cache and the code in wrapper we get caching. A quick timing test proves our cache is working:

@simple_cache
def fib(n):
    if n <= 1: return 1
    return fib(n-1) + fib(n-2)

fib_time = timeit.timeit('fib(30)', number=200, setup="from __main__ import fib")
print(fib_time) # 3.398300032131374e-05

🤯 

Again, if you feel confused, stepping through this script in a debugger can be very enlightening.

Best Practices

Our simple_cache decorator violates some best practices, limiting its usefulness. First, it would only work with functions that take a single argument because wrapper only accepts a single parameter. Second, it would break Python’s introspection systems, including functions like help(). Lack of proper introspection can make debugging more difficult. It also breaks helpful tools built into the interactive shell, many IDEs, and documentation tools.

Here’s a recipe for a better decorator that overcomes both of these weaknesses:

import functools

def proper_decorator(input_function):

    @functools.wraps(input_function)
    def wrapper(*args, **kwargs):
        # Do stuff before the function call...
        r = input_function(*args, **kwargs)
        # Do stuff after the function call
        return r
    
    return wrapper

The built-in decorator used on the inner function (inception!) helps Python’s introspection system work with decorated functions. The ‘unpacking’ syntax (the asterisks) used on *args and **kwargs allows any function decorated with @proper_decorator to accept any number of arguments or keyword arguments.

Let’s expand our simple_cache to conform to these best practices:

def better_cache(input_function):
    cache = {}

    @functools.wraps(input_function)
    def wrapper(*args, **kwargs):
        cache_key = (args, tuple(sorted(kwargs.items()))) 
        if cache_key in cache: 
            r = cache[cache_key]
            print(f'HIT {cache_key} -> {r}')
        else:
            r = input_function(*args, **kwargs)
            cache[cache_key] = r
            print(f'MISS, adding {cache_key} -> {r}')

        return r

    return wrapper

This cache decorator is now fairly robust. Consider this example:

@better_cache
def tester(a, b, c=0, d=0):
    return a + b + c + d

tester(1, 2)           # MISS
tester(1, 2)           # HIT
tester(1, 2, c=0)      # MISS!! (see the writeup below)
tester(1, 2, c=0)      # HIT
tester(1, 2, c=5)      # MISS
tester(1, 2, c=5)      # HIT
tester(1, 2, d=4, c=5) # MISS
tester(1, 2, c=5, d=4) # HIT (handles swapped order of c and d)

There are still a couple of limitations:

  1. If any of the parameters passed into a decorated function are ‘unhashable types’ (such as lists and dictionaries) then our decorator will throw an error.

  2. Explicitly sending a keyword argument using the default value will produce a different cache_key from when the keyword argument is unspecified and the default value is used.

Those aren’t such horrible limitations. The first could be solved, though it can be tricky since unhashable types are typically complex data structures. Most of the time, it won’t be worth the hassle.

I’m not convinced the second actually can be solved. I tried and failed. If you know how, please let me know! Regardless, duplicating a couple of values in the cache isn’t the end of the world.

Decorators That Accept Arguments

It’s also possible to create decorators that themselves accept arguments. For example, right now the dictionary we use as a cache can grow infinitely large. Suppose we wanted our cache to have a maximum size, we’d want to design a decorator that can be used like this:

@size_limited_cache(maxsize=20)
def tester(a, b, c=0, d=0):
    return a + b + c + d

Unfortunately, this gets a bit more complicated because a decorator must be a function that just accepts a function as input; that’s just how Python is designed. So, if we want our decorator to accept arguments, the standard workaround is to call a function that returns a decorator. In the above example, size_limited_cache is a function that accepts a maxsize parameter and returns a decorator (i.e., a function that accepts and returns a function).

Hold onto your hats, here’s what it looks like:

import functools

def size_limited_cache(maxsize=20):

    def decorator(input_function):
        cache = {}

        @functools.wraps(input_function)
        def wrapper(*args, **kwargs):
            cache_key = (args, tuple(sorted(kwargs.items()))) 
            if cache_key in cache: 
                r = cache[cache_key]
                print(f'HIT {cache_key} -> {r}')
            else:
                r = input_function(*args, **kwargs)

                if len(cache) < maxsize:
                    cache[cache_key] = r
                    print(f'MISS, adding {cache_key} -> {r}')
                else:
                    print("MISS, but cache at maxsize.")

            return r

        return wrapper
    
    return decorator

Clear as mud? Try breaking it down one function at a time:

size_limited_cache accepts one keyword parameter (maxsize) and returns the function we called decorator. maxsize gets a unique value and is “closed over” every time size_limited_cache is called, but maxsize is only used inside of wrapper.

decorator accepts a function as input (we called it input_function) and returns the function we called wrapper. decorator also initializes our cache.

wrapper actually calls input_function, uses the cache, and now it also uses maxsize to limit the growth of the cache.

Now, consider the order of operations when we decorate a function with this new setup:

@size_limited_cache(maxsize=5)
def tester(a, b, c=0, d=0):
    return a + b + c + d

On the first line size_limited_cache is called. That creates and returns the function we called decorator. This is also the moment in time that maxsize is closed over — and therefore fixed to 5 — for the function tester. Said another way: size_limited_cache(maxsize=5) results in a specific instance of decorator.

On the second line (the def) the Python interpreter calls that specific instance of decorator with tester as the input_function. The cache is initialized, a specific instance of the function wrapper is created and returned, and the Python interpreter replaces the raw version of tester with the specific instance of wrapper.

Finally, when we write code that calls tester, the instance of wrapper actually gets called, invoking our size-limited cache. Here’s a crude example:

for _ in range(3):
    for i in range(6):
        tester(i, i)

The print statements inside wrapper result in the following:

MISS, adding ((0, 0), ()) -> 0
MISS, adding ((1, 1), ()) -> 2
MISS, adding ((2, 2), ()) -> 4
MISS, adding ((3, 3), ()) -> 6
MISS, adding ((4, 4), ()) -> 8
MISS, ((5, 5), ()) but cache at maxsize.
HIT ((0, 0), ()) -> 0
HIT ((1, 1), ()) -> 2
HIT ((2, 2), ()) -> 4
HIT ((3, 3), ()) -> 6
HIT ((4, 4), ()) -> 8
MISS, ((5, 5), ()) but cache at maxsize.
HIT ((0, 0), ()) -> 0
HIT ((1, 1), ()) -> 2
HIT ((2, 2), ()) -> 4
HIT ((3, 3), ()) -> 6
HIT ((4, 4), ()) -> 8
MISS, ((5, 5), ()) but cache at maxsize.

To fully mimic the builtin lru_cache decorator we’d need to do more work to keep the cache keys in an ordered collection based on how recently they’ve been used, and appropriately evict old entries. I’ll leave that as an exercise for you, dear reader, but I’ll mention that the OrderdDict class from Python’s collections module is very useful.

Multiple Decorators

You can even decorate a function multiple times. Here are two simple decorators. The first logs the parameters and return value every time a function is called. The second causes the function to be called twice, and returns both return values combined into a tuple.

import functools
import logging
import sys

logging.basicConfig(level=logging.INFO, stream=sys.stdout)


def log(input_function):

    functools.wraps(input_function)
    def wrapper(*args, **kwargs):
        logging.info(f'{input_function} called with args: {args} and kwargs: {kwargs}')
        r = input_function(*args, **kwargs)
        logging.info(f'{input_function} returned {r}')
        return r

    return wrapper


def repeat(input_function):
    
    functools.wraps(input_function)
    def wrapper(*args, **kwargs):
        r1 = input_function(*args, **kwargs)
        r2 = input_function(*args, **kwargs)
        return r1, r2
    
    return wrapper

If we decorate a function with repeat then log, the function will be called twice, and each individual call will be logged, but the final result will be a tuple; that tuple won’t appear anywhere in the logs but will appear in the final print statement:

@repeat
@log
def add(a, b):
    return a + b

r = add(1, 2)
print(f"Final value for r: {r}")

Yields:

INFO:root:<function add at 0x10734f740> called with args: (1, 2) and kwargs: {}
INFO:root:<function add at 0x10734f740> returned 3
INFO:root:<function add at 0x10734f740> called with args: (1, 2) and kwargs: {}
INFO:root:<function add at 0x10734f740> returned 3
Final value for r: (3, 3)

Decorators are applied from bottom to top. So, when we execute the def in this code, the following happens in this order:

  1. The raw function add is created.

  2. That add function is sent to the log decorator as input_function, which returns an instance of wrapper, specifically, the one that adds logging information.

  3. That new logged function is passed into the repeat decorator as input_function; repeat returns an instance of its wrapper.

  4. The Python interpreter replaces references to our raw add function with the wrapper returned from repeat in step 3.

We can see that both of the individual calls to the wrapper created in step 2 get logged separately.

Now consider what happens when we switch the order of the decorators:

@log
@repeat
def add(a, b):
    return a + b

r = add(1, 2)
print(f"Final value for r: {r}")
INFO:root:<function repeat.<locals>.wrapper at 0x10ef177e0> called with args: (1, 2) and kwargs: {}
INFO:root:<function repeat.<locals>.wrapper at 0x10ef177e0> returned (3, 3)
Final value for r: (3, 3)

Instead of logging each individual function call the wrapper of the repeat decorator is passed as input_function to the log decorator. That’s why we only get one set of log lines instead of two sets. Additionally, the final return value matches the value that gets logged (i.e., the tuple (3,3)).

Challenge yourself and put it all together:

Consider the following code — which uses 3 of the decorators we built — and try to decide what gets logged and what key-value pair gets saved in the cache.

@size_limited_cache(maxsize=5)
@repeat
@log
def add(a, b):
    return a + b


add(1,2)
add(1,2)

Remember…

The Lab Report is free and doesn’t even advertise. Our curricula is open source and published under a public domain license for anyone to use for any purpose. We’re also a very small team with no investors.

Help us keep providing these free services by scheduling one of our world class trainings, requesting a custom class for your team, or taking one of our open enrollment classes.

Reply

or to participate.