Getting a Proper Random Number on a Computer

Dungeons & Dragons dice have all the possible shapes that exist!

99.9% of the time, I see people using random number generator in their software.

Say you have a random function b() that generates random bits. Each time you call b(), you either get 0 or 1.

Now, in most cases people are going to get a larger number, say you write a function r() which returns 8 such bits. You could write it like this:

int r()
{
  return (b() << 7)
       | (b() << 6)
       | (b() << 5)
       | (b() << 4)
       | (b() << 3)
       | (b() << 2)
       | (b() << 1)
       | (b() << 0);
}

Assuming you have a perfect random generator, when calling b(), you get a 50% chance of getting 0 and 50% chance of getting 1.

Therefore when calling r(), you get 1 chance in 256 to get 0, 1 chance in 256 to get 1, etc. In other words, if you call the function 256 times, you could get every value once (unlikely, but it could happen.)

Most of the time, though, we want a random number between 0 and a number 'n' which is not exactly 256, or any power of 2. For example, to get a percent 1 to 100, what many programmers do is use the modulo operator like so:

percent = r() % 100 + 1;

If we look closer to that math, we can see it is wrong. Why? Because r() returns a number between 0 and 255, which means the modulo used here returns a skewed result.

Look at this this way, you will get the equivalent of three "equivalent" groups like the follow:

  0.. 99 -- group 1
100..199 -- group 2
200..255 -- group 3

So, r() returns a number which fits in group 1, 2 or 3. Then you apply the modulo and get a number between 0 and 99 with group 1 and group 2. The distribution is perfect and matches one to one with our results.

However, in group 3, we only have 56 numbers. In other words, if r() returns a number in group 3, r() % 100 is skewed toward numbers 0 to 55 instead of 0 to 99.

Therefore result is more likely to be between 1 and 56 since that can happen 3 times, whereas, a number between 57 and 100 only happens 2 times in the number returned by r().

To releave your random generator from that problem, you must bound your results and call r() again if the result is outside of the bounds.

In our specific case, we can write:

for(;;)
{
  int p(r());
  if(p < 200)
  {
    return p % 100 + 1;
  }
}

This function returns a valid random number where each number had the same amount of chance to be selected.

If you have the ability to ask n bits, the previous loop could ask for 7 bits instead of 8 and compare the result with if(p < 100). It would make it more likely to return a value which you can return.

Another Mistake that people make is to think that they can use floating point numbers and that it will resolve everything.

The concept is simple, instead of returning an integer, we create a function f() which returns a floating point between 0 a 1. In most cases, this is written as follow:

double f()
{
  return r() / 255.0;
}

Then we calculate a number between 0 and 100 from that floating point:

return static_cast<int>(f() * 99.0) + 1;

The concept is that the floating point may be any number between 0 and 1. With a large enough number of bits, that is going to be close. (Large enough means at least a number of bits equal to the size of the floating point mantissa, probably a little more.)

Yet, again, you will get a skewed result.

If you look at the groups we defined above, we can see the exact same effect on floating point numbers, only, it works differently. For each percent point that we get in the final result, we have 2 or 3 of the original numbers assigned to it.

For example, 0, 1, 2 may become 1.

Then 3 and 4 become 2.

Next 5, 6, 7 become 3.

And so on and so forth. Here is an example of the floating point function and as we can see, the final percent at times repeats two times and at times it repeats three times. So there are still numbers that have a higher chance of coming out.

r() b() percent
0 0 0
1 0.00392 0
2 0.00784 1
3 0.01176 1
4 0.01569 2
5 0.01961 2
6 0.02353 2
7 0.02745 3
8 0.03137 3
9 0.03529 4
10 0.03922 4
11 0.04314 4
12 0.04706 5
13 0.05098 5
14 0.05490 5
15 0.05882 6
16 0.06275 6
17 0.06667 7
18 0.07059 7
19 0.07451 7
20 0.07843 8
21 0.08235 8
22 0.08627 9
23 0.09020 9
24 0.09412 9
25 0.09804 10
26 0.10196 10
27 0.10588 11
28 0.10980 11
29 0.11373 11
30 0.11765 12
31 0.12157 12
32 0.12549 13
33 0.12941 13
34 0.13333 13

Okay, so the distribution is different, but in effect, some numbers still get the 3 chances instead of 2 of coming out.

Again, the only real way to resolve this problem is to use really large random numbers and making sure that the entire floating point mantissa is being used each time. Actually, if you know how to, you should generate the mantissa directly and not have a division as shown above. However, it is difficult to have a number from 0 to 1 inclusive in that case. Instead, though, you can easily have a number between 0 and 1, 1 being excluded.

Note that you should choose at least a double type to have good precision (and not a 32 bit float.)

As a result, you will need a much greater source of good random bits if you want to generate numbers using a floating point number. Whether you need a 0 or 1 result or a percent number or a much larger random number.