Consistent Random UUIDs in Python
When I'm doing data analysis or building applications with Python and I have to give entities a unique ID, I like to use random UUIDs instead of sequential numbers. Sequential numbers include information about the order and total number of data, but I want my IDs to be just a unique identifier, nothing more.
Python's standard library includes the uuid
module, for
working with UUIDs. There's a convenient function for generating random ones:
>>> import uuid
>>> uuid.uuid4()
UUID('189afb2c-1d58-4390-b35e-d5c0e3bb7472')
>>> uuid.uuid4()
UUID('0fde2d22-1918-4e39-8c6c-825c2655cbd5')
Handy. 🙂
I like my UUIDs random, but it can be useful to have them be consistent between runs. That way, you can re-run data processing scripts but keep the same UUIDs. This is a little trickier than I thought it would be, but it's certainly possible.
My first instinct was to set the random seed in the random
module,
which is usually enough to make the random number generator behave the same way
with each run:
>>> import random
>>> random.seed("peanutbutter")
>>> [random.randint(0, 100) for _ in range(12)]
[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]
>>> random.seed("peanutbutter")
>>> [random.randint(0, 100) for _ in range(24)]
[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]
Setting the seed makes random.randint
give us consistent results. Maybe it
will work for uuid.uuid4
as well.
>>> import uuid
>>> import random
>>> random.seed("peanutbutter")
>>> uuid.uuid4()
UUID('37b88a25-9f0f-4308-9532-84fd9a924c06')
>>> random.seed("peanutbutter")
>>> uuid.uuid4()
UUID('04961188-33db-4ad9-86da-e9fcfc6a22e1')
Huh. That didn't work. What gives? 🤔
Let's look at the source code for uuid.uuid4
:
def uuid4():
"""Generate a random UUID."""
return UUID(bytes=os.urandom(16), version=4)
We can see that it's using os.urandom
instead of the random
module. That function goes straight to the operating system's random number
generator, which isn't affected by random.seed
. That's definitely a good
thing! The random
module is a pseudo random number generator, and using it in
some kinds of application could cause security vulnerabilities, so for those
applications, os.urandom
is the right choice.
However, it's not what I want for my data analysis application, so how can we make it use a pseudo-random generator instead?
We can see that uuid.uuid4
is making uuid.UUID
objects. If we
can provide our own, pseudo-random bytes, we can generate pseudo-random UUIDs
instead.
uuid.UUID
wants a bytes
object, which we can make
from a sequence of integers, like this:
>>> integers = [1, 2, 4, 8, 16]
>>> bytes(integers)
b'\x01\x02\x04\x08\x10'
We can get a sequence of random, 8-bit integers (i.e. bytes) from
random
using the random.getrandbits
function
Putting it all together, we get something like this:
>>> import random
>>> import uuid
>>> def random_uuid():
... return uuid.UUID(bytes=bytes(random.getrandbits(8) for _ in range(16)), version=4)
...
>>> random.seed("peanutbutter")
>>> random_uuid()
UUID('dad39ff6-a734-4906-8804-182dda97441f')
>>> random.seed("peanutbutter")
>>> random_uuid()
UUID('dad39ff6-a734-4906-8804-182dda97441f')
Success! 🎉
That's how to generate consistent (pseudo) random UUIDs with Python's standard library.
One final note: the code above uses the shared random number generator in the
random
module, so if you need independent sequences of random UUIDs (e.g. for
running isolated tests in parallel) it might be better to use separate random
number generators for each sequence. I've written up a sample implementation of
how to do that, which is available here.