bloglien/content/posts/reverse-mersenne-twister.md

18 KiB

title date draft
Reversing a PRNG: Mersenne Twisters 2020-05-25T05:31:25-05:00 false

In my Network Security course, one of the things I worked on was an attack on a pseudo-random number generator (PRNG). In this case, my team had implemented a two factor authentication (2FA) server that generated tokens using Python's random module. By attacking the random number generator, we were able to acquire all future two factor tokens given enough samples.

A Brief Background on Mersenne Twisters

Python's random module implements random number generation with a PRNG known as a Mersenne Twister. The numbers that come out of this may seem random, but in reality they are nothing but; like most things in computing, the generation process is fully deterministic. As such, these should not be used for any type of security purposes, as pointed out by the Python documentation.

Warning: The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module.

I am no mathematician, and I will not be discussing why these deterministic algorithms produce something that seems random1. For the purposes of breaking one of these generators, though, it is worth describing their operation.

A Mersenne Twister operates by maintaining a giant array of state, the initial state of which is our random seed. This array feeds each element through a series of operations to produce a new pseudo-random number. After exhausting the entire state array, the array is regenerated ("twisted"). To discuss these operations further, we must define a set of constants used in these operations. These constants correspond to MT199372, which is what the Python implementation corresponds to.

Constant Value Notes
w 32 Word size in bits
n 624 Degree of recurrence
r 31 Lower bits of bitmask
m 397 Offset to generate next state
a 0x9908b0df Bottom row of twist transformation matrix
b 0x9d2c5680 Tempering bitmask
c 0xefc60000 Tempering bitmask
s 6 Tempering bit shift
t 15 Tempering bit shift
u 11 Tempering bit shift
l 18 Tempering bit shift

The process for generating a random numbers from the state array is as follows:

  1. y = x[i]
  2. y = y ^ (y >> u)
  3. y = y ^ ((y << s) & b)
  4. y = y ^ ((y << t) & c)
  5. y = y ^ (y >> l)
  6. Increment i
  7. return y

Each time we want to generate a random number, we will call

generate(x[i])
i += 1

where i represents the next item to inspect within the state array and x is our state array.

Once we exhaust x (i.e. we run the above operation n times), we must perform a twist operation to regenerate the state array. This process is as follows:

  1. Set x = 0
  2. Compute y = x[i] & UPPER_MASK | x[(i+1) % n] & LOWER_MASK, where LOWER_MASK is a mask representing the lower r bits of x[i], and UPPER_MASK represents the upper w - r bits.
  3. Compute x[i] = x[(i + m) % n] ^ (y >> 1) ^ (0 if y % 2 == 0 else a)
  4. Increment i, and repeat until x is fully regenerated.

This twist operation can be written as

N = 624
M = 397
A = 0x9908b0df
# Upper 1 bit (w - r = 1)
UPPER_MASK = 0x80000000
# Lower 31 bits (r = 31)
LOWER_MASK = 0x7fffffff

def twist_state_map(state):
    res = []
    for i in range(N):
        y = (state[i] & UPPER_MASK) | (state[(i+1) % N] & LOWER_MASK)
        next_item = state[(i + M) % N] ^ (y >> 1) ^ (0 if y % 2 == 0 else A)
        res.append(next_item)

    return res

Components of the Reversal

The Mersenne Twister's weakness lies in the fact that it is entirely deterministic. Our ultimate goal is to recreate our own copy of the state array. In our example, we were targeting a 2FA server, so once we recreated this array, we were able to generate all future tokens. How might we do this? We must first begin by collecting enough samples from the generator. This might be a session token generator, a 2FA server, or just a simple "guess the number" game. Given that the array is regenerated every 624 iterations (n in the above table), we will need to collect 624 samples. After reversing the generation process on each of these samples, we will have recreated the state array.

During the generation of a random number, there are two types of steps: y ^ (y >> k) and y ^ ((y << k) & m). In order to perform the reversal process, we must learn how to reverse both these processes.

The Right Shift

We will begin by describing how to reverse the simple case: a large right shift. Let's consider y = 3878751475 = 0b111001110011000100001100111100113. One of the steps in our generation process is y ^ (y >> l), or y ^ (y >> 18) in the case of MT19937. When this operation is performed, it looks something like this:

     18 bits          14 bits
================== =============
111001110011000100 00110011110011 # y
000000000000000000 11100111001100 # y >> 18
--------------------------------
111001110011000100 11010100111111 # XOR of the two

The keen eyed will notice that the first 18 bits of our result are identical to those of y4.

We can now perform our first reversal! We know that given, x ^ y = z, x ^ z = y. Therefore, in order to recover the lower 14 bits, we simply have to XOR our result with itself, shifted.

     18 bits          14 bits
================== =============
111001110011000100 11010100111111 # result from previous operation
000000000000000000 11100111001100 # result >> 18
--------------------------------
111001110011000100 00110011110011 # XOR of the two

This result is our original y! With this, we are on our way to cracking the rest of the puzzle. With this in mind, we should be able to reverse the y ^ (y >> u) step too, right? Let's again consider y = 3878751475 = 0b11100111001100010000110011110011.

  11 bits     11 bits     10 bits
=========== =========== ==========
11100111001 10001000011 0011110011 # y
00000000000 11100111001 1000100001 # y >> 11 (u == 11)
---------------------------------
11100111001 01101111010 1011010010 # XOR of the two

Unfortunately, we can't apply the same trick again.

  11 bits     11 bits     10 bits
=========== =========== ==========
11100111001 01101111010 1011010010 # result from previous operation
00000000000 11100111001 0110111101 # result >> 11
----------------------------------
11100111001 10001000011 1101101111 # XOR of the two

Though we were able to recover the first 22 bits of y, the remaining 10 bits do not match at all. Simply, we are not performing an XOR operation that is the inverse of the original on these lower bits. This poses a problem; for any instance where there is a shift smaller than half of the word size (32 bits, in our case), we lose data.

Fixing the Small Shift Problem By Waterfalling

Solving this problem was probably the most headache inducing of the reversal process. With a bit of intuition, though, it's actually quite simple. Despite not getting the lower 10 bits correct, we do know how to recover them. Look closely at the shift performed above, again. The first 10 bits of the middle 11 bits in our final result (1000100001) are actually the same bits we XOR'd with the bottom 10 bits of y!

We can revise our approach to account for this. Instead of attempting to fry the whole fish at once, as it were, we can break up our recovery process into chunks. Recall that in an operation y ^ (y >> q), the first q bits are preserved. If we only consider q bits at a time during our recovery process (starting with the first q bits, then the following q bits of what we just recovered, etc.) we can recover y. For instance, if we are recovering from a shift of 11 bits, we will only perform our XOR with the first 11 bits (shifted by 11), rather than the whole result. We then take the 11 bits we have just recovered, shift them by 11, and repeat.

I refer to this process as "waterfalling," partly because the process of taking our result and carrying it forward reminds me of a multi-stage waterfall. {{< relfigure src="img/mersenne-twister/sanje-falls.jpg" caption="A multi-stage waterfall" link="https://www.flickr.com/photos/cvickio/6283313584" >}}

Here's an example to help make this clear. Recall that when y = 3878751475 = 0b11100111001100010000110011110011, y ^ (y >> 11) = 11100111001011011110101011010010. We know that the first 11 bits (11100111001) match those of y. We can use these bits to recover the next 11 bits.

  11 bits     11 bits     10 bits
=========== =========== ==========
11100111001 01101111010 1011010010 # y ^ (y >> 11)
11111111111 00000000000 0000000000 # mask for first 11 bits
---------------------------------
11100111001 00000000000 0000000000 # AND of the two
  11 bits     11 bits     10 bits
=========== =========== ==========
11100111001 01101111010 1011010010 # y ^ (y >> 11)
00000000000 11100111001 0000000000 # result >> 11
---------------------------------
11100111001 10001000011 1011010010 # XOR of the two (call this z)

We're getting somewhere! Our first 22 bits now match y. We now repeat the process for the bottom 10 bits.

  11 bits     11 bits     10 bits
=========== =========== ==========
11100111001 10001000011 1011010010 # z
00000000000 11111111111 0000000000 # mask for next 11 bits
---------------------------------
00000000000 10001000011 0000000000 # AND of the two
  11 bits     11 bits     10 bits
=========== =========== ==========
11100111001 10001000011 1011010010 # z
00000000000 00000000000 1000100001 # result >> 11 (note the chopped off bit)
---------------------------------
11100111001 10001000011 0011110011 # XOR of the two

We have now fully recovered y.

Codifying the Right Shift Undo

Here's a Python snippet to perform what we just outlined.

W = 32 # Word size, defined by MT19937

def undo_right_transform(value, shift):
    res = value

    for i in range(0, W, shift):
        # Work on the next shift sized portion at a time by generating a mask for it.
        portion_mask = '0' * i + '1' * shift + '0' * (W - shift - i)
        portion_mask = int(portion_mask[:W], 2)
        portion = res & portion_mask

        res ^= portion >> shift

    return res

Left Shift

The left shift operation is nearly identical to our waterfalled right shift method, except we now waterfall from the right, rather than the left. There is one caveat though: All of the left shift transforms have a bitwise AND in them, such as y ^ ((y << t) & c). This may look like we're losing data because of the &, but we don't; we're performing our bitwise AND on the data we're XORing with the original. In other words: we're actually performing our bitwise AND on parts of the data we have already restored. I've included an example to help illustrate this, but really, it isn't all that different from the waterfalled right shift process. The only major difference is that we perform the bitwise AND. Again, we will use y = 3878751475 = 0b11100111001100010000110011110011. We will now perform y ^ ((y << t) & c)

2b     15 bits         15 bits
== =============== ===============
10 000110011110011 000000000000000 # y << t (t == 15)
11 101111110001100 000000000000000 # c (0xefc60000)
----------------------------------
10 000110010000000 000000000000000 # AND the two
2b     15 bits         15 bits
== =============== ===============
11 100111001100010 000110011110011 # y
10 000110010000000 000000000000000 # previous result
----------------------------------
01 100001011100010 000110011110011 # XOR of the two (call this z)

And now we perform the reversal...

2b     15 bits         15 bits
== =============== ===============
01 100001011100010 000110011110011 # z
00 000000000000000 111111111111111 # mask off the lower 15 bits
----------------------------------
00 000000000000000 000110011110011 # AND of the two
2b     15 bits         15 bits
== =============== ===============
00 000110011110011 000000000000000 # result << 15
11 101111110001100 000000000000000 # c (0xefc60000)
----------------------------------
00 000110010000000 000000000000000 # AND of the two
2b     15 bits         15 bits
== =============== ===============
00 000110010000000 000000000000000 # result
01 100001011100010 000110011110011 # z
----------------------------------
01 100111001100010 000110011110011 # XOR the two

After one cycle, we've recovered the lower 30 bits of y. We continue once more.

2b     15 bits         15 bits
== =============== ===============
01 100111001100010 000110011110011 # previous result
00 111111111111111 000000000000000 # mask off the middle 15 bits
----------------------------------
00 100111001100010 000000000000000 # AND of the two
2b     15 bits         15 bits
== =============== ===============
10 000000000000000 000000000000000 # result << 15
11 101111110001100 000000000000000 # c (0xefc60000)
----------------------------------
10 000000000000000 000000000000000 # AND of the two
2b     15 bits         15 bits
== =============== ===============
10 000000000000000 000000000000000 # result
01 100001011100010 000110011110011 # z
----------------------------------
11 100001011100010 000110011110011 # XOR of the two

y is now fully recovered!

Codifying the Left Shift Undo

Here's a Python snippet to perform what we just outlined.

def undo_left_transform(value, shift, mask):
    res = value
    for i in range(0, W, shift):
        # Work on the next shift sized portion at a time by generating a mask for it.
        portion_mask = '0' * (W - shift - i) + '1' * shift + '0' * i
        portion_mask = int(portion_mask, 2)
        portion = res & portion_mask

        res ^= ((portion << shift) & mask)

    return int(res)

Performing the Reversal

In order to generate the state array, we must collect enough samples to fill it. Each time we perform a reversal of the generation steps, we collect one element of the state array. As previously stated, we must collect 624 samples (n in the above table) from a PRNG to perform this. It is worth noting that in order to make sure we can replicate the state array, these samples must be consecutive5. However, once we collect our samples, generating the state array is as simple as feeding them through the reversal steps, in reverse order from the generation steps.

# Constants for MT19937, from the table above
L = 18
T = 15
C = 0xefc60000
S = 7
B = 0x9d2c5680
U = 11

for sample in samples:
	y = undo_right_transform(sample, L)
	y = undo_left_transform(y, T, C)
	y = undo_left_transform(y, S, B)
	y = undo_right_transform(y, U)
	state.append(y)

Each time we run this loop, we are recovering a single item from the state array. In other words: by running over the entire set of samples, we have the state map that produced these numbers! To generate future random numbers, all we have to do is twist the state map, and feed the generation process forward.

state = twist_state_map(state)
while True:
	# Generate the next 624 samples by iterating over the state array
	for item in state:
		y = item ^ (item >> U)
		y ^= (y << S) & B
		y ^= (y << T) & C
		y ^= y >> L
		print(y)

	# Twist the state map again, as we have exhausted our samples
	state = twist_state_map(state)

And that's it! From there, you now have access to the same numbers as the PRNG that output the samples you collected.

Takeaways

This project was a nice reintroduction to some of the bit-twiddling I don't get to do very often. The process really is far more simple than one might think. Reversing a PRNG isn't outlandish, and hopefully seeing this process underscores why using a more secure random number generator (like from Python's secrets module) is important, especially in authentication contexts like 2FA or session tokens.


  1. For a better explanation of that, take a look at Archana Jagannatam's "Mersenne Twister - A Pseudo Random Number Generator and its Variants". ↩︎

  2. See _randommodule.c in the CPython source. ↩︎

  3. The keen reader will also notice that our reversal process has is using full 32 bit integers. Indeed, using the full 32 bit integers is a requirement of the reversal process. If we have something that is say, a modulus of this result, some additional brute forcing is required. ↩︎

  4. Of course, this depends on CPython using a logical shift. Looking at the source, we can see that unsigneds are used, which guarantee such a shift. ↩︎

  5. A blog post I read during my investigation stated that it is possible to recreate the state array with non-consecutive samples, but didn't go into detail. If I had to guess, depending on which samples were missing, you could likely do some magic with the x[(i + M) % N] portion of the twist process, to figure these out, but I haven't looked into it too deeply. You would certainly need to know how many samples were missing in between to do this. For the attack vector my team had explored, we didn't need this. ↩︎