This article describes how PKCS#5 padding works and how it can be exploited. This method is often used with block cyphers such as AES or DES.
How it works
With block cyphers, the text must be of a specific size, a multiple of a block length. Often it is not. How to make it work? Add something to the end of the message. But what? How would the recipient know when the message ends, and the padding starts? Imagine we have the following end of the message padded with just 0
s:
45 | 43 | 56 | 44 | 20 | 34 | 0 | 0 | 0 | 0 |
---|
How does one know these zeroes are not part of the message? Or maybe some of these zeroes are. Consider now the following example.
45 | 43 | 56 | 44 | 20 | 34 | 4 | 4 | 4 | 4 |
---|
So, the recipient sees repeated 4
s. How do they know that this is padding and not the actual contents? Easy. How many 4
s? Four. Coincidence? No π See another example below.
45 | 43 | 56 | 44 | 20 | 5 | 5 | 5 | 5 | 5 |
---|
Now, the padding value is 5,
and the number of 5
s is … surprise-surprise… 5
!
βWhat if the message length is already a multiple of the block length? We have to add a full padding to it π€·π»ββοΈ.
How is it exploited
In the case of AES (CBC), one needs a key and an IV to encrypt the message.
Say the attacker controls the IV cypher text and can send the modified value to the recipient for validation for as long as they need.
Let’s start with an example. Let’s say we want to encrypt the message: explore
. If it’s ASCII, it’s 7 bytes long. Let’s translate this into char: 65 78 70 6C 6F 72 65
. Since the message needs to be padded to become 8 bytes long and we only have one character to add, we add 01
(see PKCS#5 padding).
65 | 78 | 70 | 6C | 6F | 72 | 65 | 01 |
---|
Now, we pick an array of random numbers of the same length to make an IV
(initialisation vector):
89 | 03 | 42 | 12 | 01 | 00 | 98 | 54 |
---|
We first XOR IV with the message to get the following result:
EC | 7B | 32 | 7E | 6E | 72 | FD | 55 |
---|
Then we encrypt this newly acquired array with AES (a series of shifting, meddling and xoring). Suppose, we get the following result:
43 | 2B | 5C | 7D | 32 | 11 | 01 | 3A |
---|
In the end, we have the following output:
89 | 03 | 42 | 12 | 01 | 00 | 98 | 54 | 43 | 2B | 5C | 7D | 32 | 11 | 01 | 3A |
---|
The first 8 bytes are the IV, and the last 8 are the encrypted message. How do we figure out the plaintext when all we can access are IV and the recipient’s response code? To understand this, we need to see how the recipient decrypts and decodes the message.
Let’s imagine that Alan Turing π is writing to Churchill π¦, informing him of his progress on the Enigma. And let’s imagine that Wilhelm Franz Canaris π¦ is eavesdropping on this channel.
Turing π and Churchill π¦have a key π that they use to encrypt their messages. Canaris π¦ doesn’t. He only knows the IV and can send modified result to Churchill π¦to decrypt. Churchill π¦can’t tell if the letter comes from Turing π or not, and in case he has some errors when decrypting, he will send back the following message “Dear Mr. Turing, it has come to my attention that the cipher employed in your most recent epistle leaves much to be desired in terms of impenetrability. I must kindly request that you endeavor to transmit your message once more, with the appropriate level of cryptographic fortification.” (Thanks to ChatGPT for this passage).