Pastebin.com

Over the past couple of weeks I've mostly been blogging about inconsequential things. Blame summer for this -- it's hard to be serious when it's 104 degrees out. But also, the world just hasn't been supplying much in the way of interesting stuff to write about.

Don't get me wrong, this is a good thing! But in a (very limited) way it's also too bad. Broken things are how we learn. To understand how we design secure systems, we have to see where and how they become insecure. Evaluating a broken system help us to understand and internalize those lessons, which is good for everyone in the long run.

Now fortunately for us, we're not completely helpless. If we want to learn something about system analysis, there are plenty of opportunities out there in the wild. The best place to start is by finding a public protocol that's been published, but (preferably) not yet implemented. Download the spec and just start poking!

That will be our task today. The system we'll be looking at is completely public, and (to the best of my knowledge) has not yet been deployed anywhere. It's great for cryptanalysis because it includes all kinds of complicated crypto that hasn't been seriously reviewed by anyone yet.

(Or at least, my Google searches aren't turning anything up. I'm very willing to be corrected.)

Best of all, I've never looked at this system before. Honest! So whatever we find (or don't find), we'll be doing it together.

A note: this obviously isn't going to be a short post. And the TL;DR is that there is no TL;DR. This post isn't about finding bugs (although we certainly will), it's about learning how the process works. And that's something you do for its own sake.

HDCPv2

The protocol we'll be looking at today is the High Bandwidth Digital Content Protection (HDCP) protocol version 2. Before you get excited, let me sort out a bit of confusion. We are not going to talk about HDCP version 1, which is the famous protocol you probably have running in your TV right now.

HDCPv1 was analyzed way back in 2001 and found to be wanting. Things got much worse in 2010 when someone leaked the HDCPv1 master key -- effectively killing the whole system.

What we'll be looking at today is the replacement, HDCP v2. This protocol is everything that its predecessor was not. For one thing, it uses standard encryption: RSA, AES and HMAC-SHA256. It employs a certificate model with a revocation list. It also adds exciting features like 'localization', which allows an HDCP transmitter to determine how far away a receiver is, and stop people from piping HDCP content over the Internet. (In case they actually wanted to do that.)

HDCPv2 hasn't hit shelves yet -- and, indeed, may never do so. Despite that, the Digital Content Protection licensing authority has been keeping a pretty up-to-date set of draft protocol specifications on their site. The latest is version 2.1, and it gives us a great opportunity to see how industry 'does' protocols.

An overview of the protocol

As cryptographic protocols go, HDCPv2 has a pretty simple set of requirements. It's designed to protect  high-value content running over a wire (or wireless channel) between a transmitter (e.g., a DVD player) and a receiver (a TV). The protocol accomplishes the following operations:
Exchanging and verifying public key certificates.
Establishing shared symmetric keys between the transmitter and receiver.
Caching shared keys for use in later sessions.
Verifying that a receiver is local, i.e., you're not trying to proxy the data to some remote party via the Internet.
These functions are accomplished via three (mostly) separate protocols: a public-key Authenticated Key Agreement (AKE) protocol, a pairing protocol, where the derived key is cached for later use, and a locality check protocol to ensure that the devices are physically close.

I'm going to take these protocols one at a time, since each one involves its own messages and assumptions.

Phase (1): Authenticated Key Agreement (AKE)

The core of HDCPv2 is a custom key exchange protocol, which looks quite a bit like TLS. (In fact, the resemblance is so strong that you wonder why the designers didn't just use TLS and save a lot of effort.) It looks like this:


HDCPv2 key agreement protocol (source). Click the image to enlarge.
Now, there's lots going on here. But if we only look at the crypto, the summary is this:

The transmitter starts by sending 'AKE_Init' along with a random 64-bit nonce R_tx. In response, the receiver sends back its certificate, which contains its RSA public key and device serial number, all signed by the HDCP licensing authority.

If the certificate checks out (and is not revoked), the transmitter generates a random 128-bit 'master secret' K_m and encrypts it under the receiver's public key. The result goes back to the receiver, which decrypts it. Now both sides share K_m and R_tx, and can combine them using a wacky custom key derivation function. The result is a shared a session key K_d.

The last step is to verify that both sides got the same K_d. The receiver computes a value H', using HMAC-SHA256 on inputs K_d, R_tx and some other stuff. If the receiver's H' matches a similar value computed at the transmitter, the protocol succeeds.

Simple, right?

Note that I've ignored one last message in the protocol, which turns out to be very important. Before we go there, let's pause and take stock.

If you're paying close attention, you've noticed a couple of worrying things:
The transmitter doesn't authenticate itself at all. This means anyone can pretend to be a transmitter.
None of the handshake messages (e.g., AKE_Transmitter_Info) appear to be authenticated. An attacker can modify them as they transit the wire.
The session key K_d is based solely on the inputs supplied by the transmitter. The receiver does generate a nonce R_rx, but this is only used in the localization protocol.
None of these things by themselves are a problem, but they make me suspicious.

Phase (2): Pairing

Public-key operations are expensive. And you only really need to do them once. The designers recognized this, and added a feature called 'pairing' to cache the derived K_m for use in later sessions. This is quite a bit like what TLS does for session resumption.

However, there's one catch and it's where things get complicated: some receivers don't have a secure non-volatile storage area for caching keys.

This didn't phase the designers, who came up with a 'clever' workaround for the problem: the receiver can simply ask the transmitter to store K_m for it.

To do this, the receiver encrypts K_m under a fixed internal AES key K_h (which is derived by hashing the receiver's RSA private key). In the last message of the AKE protocol the receiver now sends this ciphertext back to the transmitter for storage. This appears in the protocol diagram as the ciphertext E(K_h, K_m).

The obvious intuition here is that K_m is securely encrypted. What could possibly go wrong? The answer is to ask how K_m is encrypted. And that's where things get worrying.

According to the spec, K_m is encrypted using AES in what amounts to CTR mode, where the 'counter' value is defined as some value m. On closer inspection, m turns out to be just the transmitter nonce R_tx padded with 0 bits. So that's simple. Here's what it looks like:

Encryption of the master key K_m with the receiver key K_h. The value m is equal to (R_tx || 0x000000000000000).
Now, CTR is a perfectly lovely encryption mode provided that you obey one unbreakable rule: the counter value must never be re-used. Is that satisfied here? Recall that the counter m is actually chosen by another party -- the transmitter. This is worrying. If the transmitter wants, it could certainly ask the receiver to encrypt anything it wants under the same counter.

Of course, an honest transmitter won't do this. But what about a dishonest transmitter? Remember that the transmitter is not authenticated by HDCP. The upshot is that an attacker can spoof it, and submit its own values to be encrypted by the receiver under K_h.

Even this might be survivable, if it weren't for one last fact: in CTR mode, encryption and decryption are the same operation.

Which leads to the following attack:
Observe a legitimate communication between a transmitter and receiver. Capture the values R_tx and E(K_h, K_m) as they go over the wire.
Now: pretend to be a transmitter and initiate your own session with the receiver.
Replay the captured R_tx as your initial transmitter nonce. When you reach the point where you pick the master secret, don't pick a random number. Instead, use the ciphertext E(K_h, K_m). Expressed more concretely, this ciphertext has the form:

AES(K_h, R_Tx || 000...) ⊕ K_m

Encrypt this value under the receiver's public key and send it along.
Sooner or later the receiver will encrypt the 'master secret' you chose above under the key K_h. This result can also be expanded, and has this form: AES(K_h, R_Tx || 000...) ⊕ AES(K_h, R_Tx || 000...) ⊕ K_m
Thanks to the beauty of XOR, the first two terms of this ciphertext simply cancel out. The result is the original K_m from the first session! Yikes!

If this works, it's a huge problem. First, K_m is used to derive the session keys used to encrypt HDCP content, which means that you can now decrypt any past HDCP content traces. And even worse, thanks to the 'pairing' process, you may be able to use this captured K_m to initiate or respond to further sessions involving this transmitter.

Did I mention that protocols are hard?

I want to stress that this attack is based on my interpretation of the protocol, which may be wrong. But if it works it's very much a deal-killer. In other words, we could stop our analysis now. Let's not.

Phase (3): The Locality Check

So far what we've learned is that encryption is dangerous. In the next section, we're going to learn that protocols are dangerous too, especially when you have more than one of them.

At its heart, the locality check is a pretty simple thing. The transmitter and receiver are both trusted, and have successfully exchanged a session key by running the AKE protocol above. The locality check is designed to ensure that the receiver is nearby -- specifically, that it can provide a cryptographic response to a challenge, and can do it in < 7 milliseconds. This is a short enough time that it should prevent people from piping HDCP over a WAN connection.

(Why anyone would want to do this is a mystery to me. But I digress.)

In principle the locality should be simple. In practice, it's complicated. That's because there are actually two separate protocols for it, depending on what kind of device you're using.

Here's the first one:

Simple version of the locality check. K_d is a shared key and R_rx is a receiver nonce.
This is the simplest challenge-response protocol you can imagine. The transmitter generates a random nonce R_n and sends it to the receiver. The receiver now has 7 milliseconds to kick back a response, which is computed as HMAC-SHA256 of {the session key K_d, challenge nonce R_n, and a 'receiver nonce' R_rx}. You may recall that the receiver nonce was chosen during the AKE.

So far so good, this is a nice simple protocol, and pretty hard to beat..

But here's a wrinkle: some devices are slow. Remember that the 7 milliseconds has to include the round-trip communication time, as well as the time required to compute the HMAC. There is a very real possibility that slow devices might be not be able to handle this.

Will HDCP provide a second, optional protocol to deal with those devices? You bet it will.

The second protocol allows the receiver to pre-compute the HMAC response before the timer starts ticking. Here's what it looks like:


'Precomputed' version of the protocol.


This is almost exactly the same protocol, with one small difference: the transmitter gives the receiver all the time it wants to compute the HMAC.

The locality check is now kicked off when the receiver says it's ready. Only at this point does the transmitter start its clock. Of course, there has to be something keeping the RTT under 7ms. In this case it's that the receiver won't speak until it receives the least significant 128-bits of the expected HMAC result from the transmitter. Only when it receives these bits will it kick back its own response, which is the most-significant 128 bits of the same value.

So this is more complicated, but on its own, also looks pretty ok by me.

But here's a funny question: what if we're running both protocols at once?

No, I'm not being ridiculous. What if, as a man-in-the-middle attacker, we can convince the transmitter to run the 'pre-computed' protocol. And at the same time, convince the receiver to run the 'simple' one? Remember that none of the protocol flags (transmitted during the AKE) are authenticated. It's possible we could trick both sides into seeing a different view of the other's capabilities.

Now imagine: we have a receiver running in China, and a transmitter located in New York. We're a man-in-the-middle sitting next to the transmitter. We want to convince the transmitter that the receiver is close -- close enough to be on a LAN, for example. Here's what we might do:
Modify the message flags so that the transmitter thinks we're running the pre-computed protocol. It will start by handing us R_n and then give us all the time in the world to do our pre-computation.
Convince the receiver to run the 'simple' protocol. Send R_n to it, and wait for it to send back the HMAC result (L').
Take a long bath, mow the lawn. Watch Season 1 of Game of Thrones.
At our leisure, send the RTT_READY message to the transmitter, which has been politely waiting for the receiver to finish pre-computing.
The transmitter will now send us some bits. Immediately send it back the most significant bits of the value L', which we got in step (2).
Send video to China.
Now I won't claim that the above attack necessarily works. There are subtle details here, which may go beyond the problem of tricking the receiver and sender. Still, this is a great teaching example because it illustrates a key fact in cryptographic protocol design. Namely, that parties may not share the same view of what's going on.

The designer's most important job is to ensure that such disagreements can never happen. The best way to do this is to ensure that there's only one view to be had -- in other words, dispense with all the options and write one clear protocol. But if you must have options, make sure that the protocol only succeeds if both sides agree on what those options are.

Compared to the importance of learning those lessons, actually breaking localization is pretty trivial. It's a stupid feature anyway.

In Conclusion

This has been an unbelievably long post. To the two or three readers I have left at this point: thanks for sticking it out.

The only remaining thing I'd like to say is that this post is not intended to judge HDCPv2, or to make it look bad. It may or it may not be a good protocol, depending on whether I've understood the specification properly and depending on whether the above flaws make it into real devices. Which, hopefully they won't now.

What I've been trying to do is teach a basic lesson: protocols are hard. They can fail in ruinous, subtle, unexpected, exciting ways. The best cryptographers -- working with BAN logic analyzers and security proofs -- still make mistakes. If you don't have those tools, steer clear.

The best 'fix' for the problem is to recognize how dangerous protocols can be,and to avoid designing your own. If you absolutely must do so, please try to make yours as simple as possible. Too many people fail to grok this lesson, and the result is, well, HDCPv2.