For large amounts of data, you need to use /dev/urandom
. The u
is for "unlimited", meaning that there will always be data available. If you try to read a lot of data from /dev/random
, it will block, preventing your program from continuing for a while.
Both /dev/random
and /dev/urandom
provide unpredictable (random) data. The data from /dev/random
is intended to be completely unpredictable (or truly random), making it suitable for things like long-term cryptographic keys (where an attacker in the future may have the advantages of extensive research and much faster computers to try and break the algorithm used to generate the data). The data from /dev/urandom
is based on truly random data, but may be run through a high-quality pseudo-random function to produce additional data. It is still suitable for things like encryption keys, as long as you don't need to be sure they won't be broken for decades, but can also be used for bulk data.
The Linux kernel maintains an "entropy pool" of unpredictable data, wherein each bit has an equal chance of being true or false (one or zero). The kernel builds this entropy pool from various inputs, such as hardware sources, drivers, user actions, and anything else that cannot be reliably predicted. However, these sources take time to accumulate entropy, so the entropy pool can be depleted if it is consumed too quickly.
/dev/random
draws directly from the entropy pool. When the pool is depleted, reading from /dev/random
doesn't return any more data until the pool has refilled enough, which can take quite some time. /dev/urandom
uses a cryptographically-secure pseudo-random number generator (CSPRNG) seeded from the entropy pool. It can generate an infinite amount of output, but the output cannot be predicted without knowing the internal state of the CSPRNG. Because the internal state is initially based on truly-random data, and the CSPRNG algorithm used is designed to not leak its internal state through its output, /dev/urandom
is still a good source of highly-random data.
To recap, when you need maximally random data, use /dev/random
. However, if you need a lot of data, you need to use /dev/urandom
. In general, use /dev/urandom
unless you need the data to be indistinguishable from truly random noise for decades to come.