Monday, February 10, 2025

Identity Management Atoms: Asymmetric Public/Private Keys

Asymmetric keys are the final element we need for Identity Management.  So far, everything we've covered makes it possible to send data secretly, and to make sure that we can confirm that data we send or receive hasn't been tampered with -- but we cannot share Symmetric keys easily, out in the open, where everyone can see it -- we have to share these things privately -- and that's kindof difficult to do on a forum open to the public, such as the internet.

Heck, even if we limit our communications to pencil an paper, it might be nice to share a way for people to reach out and contact us!  If only Alice could pin a key of some sort on that bulletin board, so that Bob can encrypt something and share it with Alice.  That way, Alice wouldn't even have to meet Bob to exchange information privately!

The first algorithm that provided for just this is called the Diffie-Hellman Key Exchange (that some suggest should be called the Diffie-Hellman-Merkle key exchange, to recognize Merkle's role in laying the foundations -- which nonetheless puts aside that a British Intelligence team came up with the same algorithm several years before, but had to keep it classified until much later -- and who knows, maybe it will be found one day in one of the numerous works that Leonard Euler wrote, leading us to sigh and say "It's a good thing we didn't know about it, because otherwise half of mathematics would be named after him!"? -- there's a reason I simultaneously appreciate and don't worry about making sure everyone gets credit!).

The idea is relatively simple:  Alice and Bob agree on a key $P_\infty$ (recall that the $\infty$ subscript is a reminder that the key is shared by everyone) to use as a basis for communication.  Alice chooses a private key $A_0$ (recall that the $0$ subscript is a reminder to share the key with zero people), and combines this to create a key $A_0 P_\infty$ she publishes publicly; likewise, Bob can share $B_0 P_\infty$ with the world after creating his own private key $B_0$.  To communicate, Alice and Bob combine these publicly shared keys with their private keys, $(A_0 (B_0 P)) = (B_0 (A_0 P)) = S_0$, and by the magic of modular arithmetic (ie, mathematics I don't want to delve into right now), things mix together to produce an $S_0$ that can then be used to share messages between Alice and Bob.

For what it's worth, Wikipedia has a more colorful explanation -- by literally using colors and color mixing to explain what's going on.

Ok, maybe it's not exactly simple -- and it relies on advanced mathematics to allow for things to cancel out nicely, so that the symmetric key is the same for Bob and Alice.  But it gets the job done, and it's used internally in a lot of internet protocols where sharing messages, rather than confirming identity, is the primary concern.  It's not quite a public-private key system -- but it's a bridge between symmetric keys and asymmetric ones!

Shortly after Diffie-Hellman was made public, Rivest, Shamir, and Adleman created a simpler algorithm:  rather than having a public key that everyone uses as a basis for creating private shared secrets, each individual produces their own public and private keys.  Hence, Alice creates $A_0$ for herself, and shares $A_\infty$ to the world, while Bob creates $B_0$ for himself and $B_\infty$ for the world.  If Bob wishes to share a message $M$ with Alice, he encrypts it with Alice's public key $A_0(M)$ -- and if Alice wishes to read it, she applies her private key $\A_0 (A_\infty(M))$, which cancels out the encryption, leaving Alice (who, if she's careful, is the only one who has her private key!) the only person in the world besides Bob to be able to read the message.

Now, here's the fun thing about asymmetrical encryption:  both keys can be used for encryption -- and the other key decrypts!  If Alice wanted to, she could take a message $M$ and encrypt it with her private key, $A_\infty(M)$, and then share the result with the world.  If Bob, or the President, or my sister and her darling dachshund, or anyone, really, wanted to read the message, they can -- they just have to apply the public key (also available to the public) to the message:  $A_0(A_\infty(M)) = M$.  But why would Alice want to do this, though, if the purpose of encryption is to keep unwanted people from reading messages?  Well, when Alice does this, she isn't just sharing the message:  she's reminding the world that, as the world's only holder of the private key $A_0$, she's the only person who could encrypt something that can be decrypted by $A_\infty$.  Thus, you can be fairly certain the message came from her!  This is the basis of cryptographic document signing.

Of course, if Alice wanted to, she could send Bob a signed message $A_0(M)$ -- and if it's only intended for Bob, she can further encrypt it with Bob's public key, via $B_\infty(A_0(M))$ -- thus, to read this message, Bob would use his private key to "unwrap" the message, $B_0(B_\infty($A_0(M)) = $A_0(M)$, which can then be further decrypted by $A_\infty$ to confirm that, not only is the message intended for Bob, but that it can only have been sent from Alice.

Besides RSA algorithms of various strengths, there are now algorithms based on elliptical curves, which (if I understand correctly) may be computationally as fast as symmetric keys, and can also be smaller while offering the same level of security -- because as fantastic as public/private key encryption may be, it's still computationally slow, so it still makes sense to use symmetric keys when you can, and transfer them via public/private key authentication, rather than using public/private keys directly.

As of right now, asymmetric keys have only three weaknesses, two real, and one theoretical.

The first weakness is that you have to make sure you never let other people know your key -- and that's a challenge, considering how many vulnerabilities have been found in our software! -- but there are schemes for rotating through keys that make this more manageable.

The second weakness is called "Man in the Middle" -- if Eve wanted to listen in to Bob's and Alice's conversation, and she can intercept their traffic, she can create her own public/private key pair $E_0$ and $E_\infty$, and when Bob tries to send a message to Alice, if he convinces Bob that $E_\infty$ is Alice's public key, then Bob would try to send a message to Alice via $E_\infty(M)$, which Eve would then decrypt with her private key and encrypt with Alice's public key -- $A_\infty(E_0(E_\infty(M))) = A_\infty(M))$ -- which Alice can now decrypt with her private key.  And if Alice tries to respond, and Eve managed to convince Alice that $E_\infty$ is also Bob's public key, Eve can read all the traffic going back and forth.  To be sure, this requires that Eve captures this stream at the beginning of the conversation, and that she is constantly there to be an intermediary between Alice and Bob -- but it is a risk nonetheless, and a real one where something public like the internet is concerned.  There are also strategies to prevent this from happening; indeed, this is why "certificates" are so important for web browsers.

And the third weakness rests on the notion that the prime numbers used in these schemes are very difficult -- indeed, beyond-the-lifetime-of-the-universe difficult -- to factor, even with the fastest of computers.  Mathematicians have been unable to prove, one way or another, that factoring like this is, indeed, hard -- so we may very well be a surprising, fantastic, and beautiful proof away from the entire security of the internet crumbling -- but mathematicians generally believe that there are no shortcuts to factoring numbers, and we may very well prove that, instead.  Physicists have been hard at work creating quantum computers that can, in theory at least, go through lots of factors all at once -- but it's unclear if engineers will ever overcome the hurdle of noise that plagues quantum mechanics so much, and if so, whether engineers would be able to gather enough "qubits" to be able to carry out the large number of computations necessary to factor large numbers.  Nonetheless, this threat is serious enough that researchers are working to develop "quantum-proof" algorithms for encryption -- and governments , for that matter, are sucking up as much current communication as they can, with the hopes that someday they'll be able to read everything!


Fun fact!  This is the first blogpost where I formally use $\LaTeX$ to format things!  I originally intended to keep it simple, but I discovered that I really wanted easy-to-format subscripts for the keys.  Plain A_0 just looked ugly! I found a forum that directed me to https://koutuholi.blogspot.com/2021/04/mathjax.html, which provides a non-supported way to provide the magic of $\LaTeX$ to blogs.

For those not familiar with $\LaTeX$, it is a fantastic document layout system used by mathematically-oriented people to write papers; I personally find the creation of documents using the system to be fantastic, but when I get frustrated with the ASCII mathematical representation, I remind myself ... that $\LaTeX$ is the worst math system out there, except for all the others!  (I particularly despise "equation editors"; they are surprisingly painful to use!)