Consistent Random UUIDs in Python
When I'm doing data analysis or building applications with Python and I have to give entities a unique ID, I like to use random UUIDs instead of sequential numbers. Sequential numbers include information about the order and total number of data, but I want my IDs to be just a unique identifier, nothing more.
Python's standard library includes the
uuid module, for
working with UUIDs. There's a convenient function for generating random ones:
>>> import uuid >>> uuid.uuid4() UUID('189afb2c-1d58-4390-b35e-d5c0e3bb7472') >>> uuid.uuid4() UUID('0fde2d22-1918-4e39-8c6c-825c2655cbd5')
I like my UUIDs random, but it can be useful to have them be consistent between runs. That way, you can re-run data processing scripts but keep the same UUIDs. This is a little trickier than I thought it would be, but it's certainly possible.
My first instinct was to set the random seed in the
which is usually enough to make the random number generator behave the same way
with each run:
>>> import random >>> random.seed("peanutbutter") >>> [random.randint(0, 100) for _ in range(12)] [79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15] >>> random.seed("peanutbutter") >>> [random.randint(0, 100) for _ in range(24)] [79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]
Setting the seed makes
random.randint give us consistent results. Maybe it
will work for
uuid.uuid4 as well.
>>> import uuid >>> import random >>> random.seed("peanutbutter") >>> uuid.uuid4() UUID('37b88a25-9f0f-4308-9532-84fd9a924c06') >>> random.seed("peanutbutter") >>> uuid.uuid4() UUID('04961188-33db-4ad9-86da-e9fcfc6a22e1')
Huh. That didn't work. What gives? 🤔
Let's look at the source code for
def uuid4(): """Generate a random UUID.""" return UUID(bytes=os.urandom(16), version=4)
We can see that it's using
os.urandom instead of the
module. That function goes straight to the operating system's random number
generator, which isn't affected by
random.seed. That's definitely a good
random module is a pseudo random number generator, and using it in
some kinds of application could cause security vulnerabilities, so for those
os.urandom is the right choice.
However, it's not what I want for my data analysis application, so how can we make it use a pseudo-random generator instead?
We can see that
uuid.uuid4 is making
uuid.UUID objects. If we
can provide our own, pseudo-random bytes, we can generate pseudo-random UUIDs
uuid.UUID wants a
bytes object, which we can make
from a sequence of integers, like this:
>>> integers = [1, 2, 4, 8, 16] >>> bytes(integers) b'\x01\x02\x04\x08\x10'
We can get a sequence of random, 8-bit integers (i.e. bytes) from
random using the
Putting it all together, we get something like this:
>>> import random >>> import uuid >>> def random_uuid(): ... return uuid.UUID(bytes=bytes(random.getrandbits(8) for _ in range(16)), version=4) ... >>> random.seed("peanutbutter") >>> random_uuid() UUID('dad39ff6-a734-4906-8804-182dda97441f') >>> random.seed("peanutbutter") >>> random_uuid() UUID('dad39ff6-a734-4906-8804-182dda97441f')
That's how to generate consistent (pseudo) random UUIDs with Python's standard library.
One final note: the code above uses the shared random number generator in the
random module, so if you need independent sequences of random UUIDs (e.g. for
running isolated tests in parallel) it might be better to use separate random
number generators for each sequence. I've written up a sample implementation of
how to do that, which is available here.