Consistent Random UUIDs in Python

When I'm doing data analysis or building applications with Python and I have to give entities a unique ID, I like to use random UUIDs instead of sequential numbers. Sequential numbers include information about the order and total number of data, but I want my IDs to be just a unique identifier, nothing more.

Python's standard library includes the uuid module, for working with UUIDs. There's a convenient function for generating random ones:

>>> import uuid
>>> uuid.uuid4()
UUID('189afb2c-1d58-4390-b35e-d5c0e3bb7472')
>>> uuid.uuid4()
UUID('0fde2d22-1918-4e39-8c6c-825c2655cbd5')

Handy. 🙂

I like my UUIDs random, but it can be useful to have them be consistent between runs. That way, you can re-run data processing scripts but keep the same UUIDs. This is a little trickier than I thought it would be, but it's certainly possible.

My first instinct was to set the random seed in the random module, which is usually enough to make the random number generator behave the same way with each run:

>>> import random
>>> random.seed("peanutbutter")
>>> [random.randint(0, 100) for _ in range(12)]
[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]
>>> random.seed("peanutbutter")
>>> [random.randint(0, 100) for _ in range(24)]
[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]

Setting the seed makes random.randint give us consistent results. Maybe it will work for uuid.uuid4 as well.

>>> import uuid
>>> import random
>>> random.seed("peanutbutter")
>>> uuid.uuid4()
UUID('37b88a25-9f0f-4308-9532-84fd9a924c06')
>>> random.seed("peanutbutter")
>>> uuid.uuid4()
UUID('04961188-33db-4ad9-86da-e9fcfc6a22e1')

Huh. That didn't work. What gives? 🤔

Let's look at the source code for uuid.uuid4:

def uuid4():
    """Generate a random UUID."""
    return UUID(bytes=os.urandom(16), version=4)

We can see that it's using os.urandom instead of the random module. That function goes straight to the operating system's random number generator, which isn't affected by random.seed. That's definitely a good thing! The random module is a pseudo random number generator, and using it in some kinds of application could cause security vulnerabilities, so for those applications, os.urandom is the right choice.

However, it's not what I want for my data analysis application, so how can we make it use a pseudo-random generator instead?

We can see that uuid.uuid4 is making uuid.UUID objects. If we can provide our own, pseudo-random bytes, we can generate pseudo-random UUIDs instead.

uuid.UUID wants a bytes object, which we can make from a sequence of integers, like this:

>>> integers = [1, 2, 4, 8, 16]
>>> bytes(integers)
b'\x01\x02\x04\x08\x10'

We can get a sequence of random, 8-bit integers (i.e. bytes) from random using the random.getrandbits function

Putting it all together, we get something like this:

>>> import random
>>> import uuid
>>> def random_uuid():
...     return uuid.UUID(bytes=bytes(random.getrandbits(8) for _ in range(16)), version=4)
...
>>> random.seed("peanutbutter")
>>> random_uuid()
UUID('dad39ff6-a734-4906-8804-182dda97441f')
>>> random.seed("peanutbutter")
>>> random_uuid()
UUID('dad39ff6-a734-4906-8804-182dda97441f')

Success! 🎉

That's how to generate consistent (pseudo) random UUIDs with Python's standard library.

One final note: the code above uses the shared random number generator in the random module, so if you need independent sequences of random UUIDs (e.g. for running isolated tests in parallel) it might be better to use separate random number generators for each sequence. I've written up a sample implementation of how to do that, which is available here.

Nat Knight — 2018-11-14