Thursday, September 12, 2024

Identity Management Atoms: Randomness

The enemy of cryptographic systems is predictability. If you can find patterns in a bit of encrypted text, those patterns can be used to find other patterns, which can then, eventually, lead to the recovery of the original message.

To illustrate this, consider a popular puzzle found in newspapers: the cryptogram. Granted, this isn't a "serious" encryption scheme -- it is only, after all, the assigning of each letter in the alphabet to a different letter -- but it's also illustrative of why randomness is so important! Here is a cryptogram from one of my puzzle books:
JRCJ BRQPR UOOIU JRO ROQXRJ GS 
CXUMWTQJV QY GYO XOYOWCJQGY GSJOY 
KOPGIOU JRO ROQXRJ GS BQUTGI QY JRO 
YONJ.

This may seem a little intimidating, but if you squint long enough, you might notice "JRCJ" and "JRO" share a "JR" -- and recall that "THAT" is a common word that begins and ends with the same letter -- and guess that "JRCJ" is "THAT" and "JRO" is "THE".  From there, you might look at "ROQXRJ", which we now have letters for:  "H E -- -- H T" -- and wonder if that word is "HEIGHT" -- and from there, wonder whether "QY" might be "IN" or "IS" (it can't be "IT", though, because we already know that "J" is the code for "T"!) -- hopefully it's clear how recognizing these patterns can "break" little bits of this code, which can then be used to "break" other little bits, until it cascades and unravels completely.  Indeed, while it's challenging, it's easy enough that this technique is used as a popular past-time!

So, if we want to encrypt something, we need to make it look as random as possible.  It has been proven that the strongest encryption, hands down, is the "One Time Pad" -- a pair of notepads filled with duplicate copies of random numbers, shared with person you wish to communicate with -- and if you wish to send a message, you take a page from your pad, use it to encrypt your message, and send it to your friend, who uses the same page to decrypt the message.  Really simple!  And proven mathematically that it's absolutely unbreakable, if it's used right.

Why don't we use this "one time pad" for all our communication needs, then?  There are several reasons:

  1. It can be difficult to get a One Time Pad to your friend, and thus, he'll have a difficult time reading your messages;
  2. You have to make sure you and your friend are literally on the "same page" for the messages;
  3. If you mess up, or get lazy, and use the same page for two messages, both of the encrypted messages can be combined to reveal both messages -- perhaps the most dramatic illustration of this I have seen uses two images encrypted with a One Time Pad to show just how both messages become apparent when combined;
  4. A One Time Pad has a limited number of pages -- and if you run out, you'll be tempted to reuse pages, or you will have to get another pad to your friend;
  5. If you want to communicate with another friend, you have to have another pair of One Time Pads, and this doesn't scale very well!
Modern cryptography techniques are designed to get around these problems -- at the cost of being "less secure" than this gold standard -- but the methods are complex enough that they make pattern recovery pretty much impossible ... and yet much of the security of these methods still hinge on the ability to create random numbers!

Which begs a couple of questions:  How are random numbers generated, anyway?  And how do we know if something is "sufficiently random"?  While I said in the introduction that I'm not going to discuss the algorithms behind these "atoms" of cryptography, I'm going to make randomness somewhat of an exception, because it's at the core of everything!  And it's good to see how difficult it can be to be truly random.

I'll answer the second question first.  "Randomness" has a simple statistical definition:  if you have a list, or perhaps a way of generating a list, of allegedly random numbers, and you cannot reliably guess what the next number will be, the list of numbers is considered random.  Considering that statistics is the mathematical art of finding patterns in data, essentially this definition boils down to "we can't find any patterns, no matter how hard we try!"

The first random number "generator" was a simple polynomial, used over and over again, using the previous output as a "seed" for the next number -- but after a limited number of these steps, the "next" number would be the first, and the sequence would repeat!  The mathematician who suggested this wasn't serious about it -- he thought that the very notion of using a deterministic machine like a computer was particularly funny -- but it was relatively simple way to provide "randomness" for simple games.

Naturally, if we assume we cannot calculate random numbers, it would seem to be reasonable to turn to nature itself for randomness.  These strategies have included watching a bit of radioactive material decay over time and watching a wall of lava lamps.  It might be tempting to suggest that this is just as bad as a computer, because theoretically the universe is "deterministic" -- but while quantum mechanics itself throws a wrench in this notion, it doesn't really matter if these things are deterministic, because our ability to capture "initial conditions" to the degree needed to make these things predictable is woefully inadequate -- and chaos theory even suggests that it will always be so.

Nonetheless, these methods have their own gotchas that can make them impractical.  For one thing, it's necessary to be very careful with sampling!  If we attempt to sample the events at too fast or too slow of a rate, we can introduce extra "1"s and "0"s into the data that eat away at the randomness we're trying to capture.  (There are strategies for fixing this that I won't review here.)  Also, it's kindof hard to keep a wall of lava lamps in your laptop or pocket computer, so it's necessary to get these bits from the internet -- and if other people are getting these same bits at the same time, that can eat away at randomness too.

Nowadays, we use a combination of the two approaches:  cryptographic functions like "hashes" and "symmetric keys" are designed to thoroughly scramble data to the point that you can only use these functions to restore the data again, and thus have made the notion of "cryptographic pseudo-random number generator" possible, contrary to our original intuition!  But to ensure further randomization, these functions need a "seed" value to start with ... and that seed is provided by the random fluctuations of things like the start times and keyboard inputs of the computers themselves.

No comments:

Post a Comment