One-time Pad - Analyst

crypto

Created: 06.10.2020

One-time pad is like a Viginear cypher but with the key the length of the cypher text.

Algo Intro

The key requirement for this cypher scheme to be perfectly secret is that the key should only be used once. Why? For example, the same key was applied to two cypher texts (c1 and c2), so that c1 = message1 XOR key and c2 = message2 XOR key. What can we do here? We can eliminate the key by XOR-ing c1 and c2: c1 XOR c2 = (message1 XOR key) XOR (message2 XOR key) = message1 XOR message2. This is not the same as having the plain text of either or both message1 or message2, but there is something we can do, using ASCII characteristics (it’s if it were ASCII, of course). Each letter starts with 01 in binary, but the space character starts with 00. Look at the possible combinations:

MESSAGE1	MESSAGE2	M1 XOR M2
`00...` (space character), `0x20`	`00...` (space character), `0x20`	`00...`
`01...` (letter)	`01...` (letter)	`00...`
`00...` (space character), `0x20`	`01...` (letter)	`01...`
`01...` (letter)	`00...` (space character), `0x20`	`01...`

So, if both message1 and message2 have the same character at the same position, then we will have 00... at this position after XORing them. If the values are different, then we will have it starting with 01.... Statistically more likely that we have either two letters or a letter with a space [1]. With this information, we might restore some parts of the message like this.

Say, we have three cyphertext. The first of them has a hex value A8 at the 9th position, the second ED and the third BD. Let’s write them in binary:

A8 - 1010 1000, hence start with 10...

ED - 1110 1101, hence start with 11...

BD - 1011 1101, hence start with 10...

If we XOR A8 with BD (10... with 10...) we will get 00 since the values have the same starting two bits. We can conclude with some amount of certainty that both have a letter as plain text. However, if we XOR either A8 with ED or A8 with BD, we get 01.... Knowing that both A8 and BD are some ASCII letter characters, we can also conclude that ED is a space character. Now, ED is the value after XORing. So, we ED XOR 20 to get the key character for the 9th character in all cypher texts = CD. Now, since the keys are the same for all the three cypher text, lets XOR the other two characters: A8 XOR CD = 65 (e) and BD XOR CD = 70 (p).

When watching the course I could not help but thinking, is it helpful to know the resulting XOR values for each possible pair of letters? Then having the message1 xor message2 we could for each byte have limited options to choose from… How limited?

I’ve first created an array of chars:

array = [chr(char) for char in range(0x0, 0xff) if chr(char).isalpha()]
print(array[0:52])
> ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

I then calculated XOR values for each combination:

# Reset
Color_Off="\033[0m"

# Regular Colors
Black="\033[0;30m"        
Red="\033[0;31m"          
Green="\033[0;32m"        
Yellow="\033[0;33m"       
Blue="\033[0;34m"         
Purple="\033[0;35m"       
Cyan="\033[0;36m"         
White="\033[0;37m" 

array = [char for char in range(0x0, 0xff) if chr(char).isalpha()]
array = array[0:52]
print(array)

for ch1 in array:
    for ch2 in array:
        print(Color_Off + "If" + Red + " ch1" + Color_Off + " is {} and ".format(chr(ch1)) + Red + "ch2" + Color_Off + " is {}".format(chr(ch2) + ", the resulting XOR is:" + Purple +  " {}".format(hex(ch1 ^ ch2)) + Yellow))

I wasn’t thinking about cool algorithms since I only have 52 charactes. The result is the following (I’ve trimmed it a bit):

If ch1 is A and ch2 is A, the resulting XOR is: 0x0
If ch1 is A and ch2 is B, the resulting XOR is: 0x3
If ch1 is A and ch2 is C, the resulting XOR is: 0x2
...
If ch1 is z and ch2 is v, the resulting XOR is: 0xc
If ch1 is z and ch2 is w, the resulting XOR is: 0xd
If ch1 is z and ch2 is x, the resulting XOR is: 0x2
If ch1 is z and ch2 is y, the resulting XOR is: 0x3
If ch1 is z and ch2 is z, the resulting XOR is: 0x0

This result is to long and there are obviously repeated values, for example, a xor z is the same as z xor athe . Besides, I already know that two identical values will deem the same XOR result. So, I’ve optimized the output:

# Reset
Color_Off="\033[0m"

# Regular Colors
Black="\033[0;30m"        
Red="\033[0;31m"          
Green="\033[0;32m"        
Yellow="\033[0;33m"       
Blue="\033[0;34m"         
Purple="\033[0;35m"       
Cyan="\033[0;36m"         
White="\033[0;37m" 

array = [char for char in range(0x0, 0xff) if chr(char).isalpha()]

# strip the non-English characters
array = array[0:52]
result = {}
# print(array)

for ch1 in array:
    for ch2 in array:
        xor_result = hex(ch1^ch2)
        # each pair of the same values will give me 0x0
        if xor_result == '0x0':
            continue
        if xor_result not in result:
            result[xor_result] = "{} xor {}".format(ch1, ch2)
        else:
            if "{} xor {}".format(ch2, ch1) in result[xor_result]:
                continue
            result[xor_result] += "; {} xor {}".format(ch1, ch2)
        # print(Color_Off + "If" + Yellow + " ch1" + Color_Off + " is {} and ".format(chr(ch1)) + Green + "ch2" + Color_Off + " is {}".format(chr(ch2) + ", the resulting XOR is:" + Purple +  " {}".format(xor_result) + Yellow))

print("{" + "\n\n".join("{!r}: {!r},".format(k, v) for k, v in sorted(result.items())) + "}")

Also, if you observe carefully the result of the scipt above, you’ll notice that the xor values of upper and lower case letters are the same. We can then show only either of them. Upper of lowercase are usually insignificant. After sorting, grouping and optimising:

{'0x1': 'B xor C; D xor E; F xor G; H xor I; J xor K; L xor M; N xor O; P xor Q; R xor S; T xor U; V xor W; X xor Y',

'0x10': 'A xor Q; B xor R; C xor S; D xor T; E xor U; F xor V; G xor W; H xor X; I xor Y; J xor Z',

'0x11': 'A xor P; B xor S; C xor R; D xor U; E xor T; F xor W; G xor V; H xor Y; I xor X; K xor Z',

'0x12': 'A xor S; B xor P; C xor Q; D xor V; E xor W; F xor T; G xor U; H xor Z; J xor X; K xor Y',

'0x13': 'A xor R; B xor Q; C xor P; D xor W; E xor V; F xor U; G xor T; I xor Z; J xor Y; K xor X',

'0x14': 'A xor U; B xor V; C xor W; D xor P; E xor Q; F xor R; G xor S; L xor X; M xor Y; N xor Z',

'0x15': 'A xor T; B xor W; C xor V; D xor Q; E xor P; F xor S; G xor R; L xor Y; M xor X; O xor Z',

'0x16': 'A xor W; B xor T; C xor U; D xor R; E xor S; F xor P; G xor Q; L xor Z; N xor X; O xor Y',

'0x17': 'A xor V; B xor U; C xor T; D xor S; E xor R; F xor Q; G xor P; M xor Z; N xor Y; O xor X',

'0x18': 'A xor Y; B xor Z; H xor P; I xor Q; J xor R; K xor S; L xor T; M xor U; N xor V; O xor W',

'0x19': 'A xor X; C xor Z; H xor Q; I xor P; J xor S; K xor R; L xor U; M xor T; N xor W; O xor V',

'0x1a': 'B xor X; C xor Y; H xor R; I xor S; J xor P; K xor Q; L xor V; M xor W; N xor T; O xor U',

'0x1b': 'A xor Z; B xor Y; C xor X; H xor S; I xor R; J xor Q; K xor P; L xor W; M xor V; N xor U; O xor T',

'0x1c': 'D xor X; E xor Y; F xor Z; H xor T; I xor U; J xor V; K xor W; L xor P; M xor Q; N xor R; O xor S',

'0x1d': 'D xor Y; E xor X; G xor Z; H xor U; I xor T; J xor W; K xor V; L xor Q; M xor P; N xor S; O xor R',

'0x1e': 'D xor Z; F xor X; G xor Y; H xor V; I xor W; J xor T; K xor U; L xor R; M xor S; N xor P; O xor Q',

'0x1f': 'E xor Z; F xor Y; G xor X; H xor W; I xor V; J xor U; K xor T; L xor S; M xor R; N xor Q; O xor P',

'0x2': 'A xor C; D xor F; E xor G; H xor J; I xor K; L xor N; M xor O; P xor R; Q xor S; T xor V; U xor W; X xor Z',

'0x3': 'A xor B; D xor G; E xor F; H xor K; I xor J; L xor O; M xor N; P xor S; Q xor R; T xor W; U xor V; Y xor Z',

'0x4': 'A xor E; B xor F; C xor G; H xor L; I xor M; J xor N; K xor O; P xor T; Q xor U; R xor V; S xor W',

'0x5': 'A xor D; B xor G; C xor F; H xor M; I xor L; J xor O; K xor N; P xor U; Q xor T; R xor W; S xor V',

'0x6': 'A xor G; B xor D; C xor E; H xor N; I xor O; J xor L; K xor M; P xor V; Q xor W; R xor T; S xor U',

'0x7': 'A xor F; B xor E; C xor D; H xor O; I xor N; J xor M; K xor L; P xor W; Q xor V; R xor U; S xor T',

'0x8': 'A xor I; B xor J; C xor K; D xor L; E xor M; F xor N; G xor O; P xor X; Q xor Y; R xor Z',

'0x9': 'A xor H; B xor K; C xor J; D xor M; E xor L; F xor O; G xor N; P xor Y; Q xor X; S xor Z',

'0xa': 'A xor K; B xor H; C xor I; D xor N; E xor O; F xor L; G xor M; P xor Z; R xor X; S xor Y',

'0xb': 'A xor J; B xor I; C xor H; D xor O; E xor N; F xor M; G xor L; Q xor Z; R xor Y; S xor X',

'0xc': 'A xor M; B xor N; C xor O; D xor H; E xor I; F xor J; G xor K; T xor X; U xor Y; V xor Z',

'0xd': 'A xor L; B xor O; C xor N; D xor I; E xor H; F xor K; G xor J; T xor Y; U xor X; W xor Z',

'0xe': 'A xor O; B xor L; C xor M; D xor J; E xor K; F xor H; G xor I; T xor Z; V xor X; W xor Y',

'0xf': 'A xor N; B xor M; C xor L; D xor K; E xor J; F xor I; G xor H; U xor Z; V xor Y; W xor X',}

So, we have limited possibilities for each XOR result. Of course, it will require a decent amount of work, but it can be done if the message is big enough. Also, a frequency analysis can also be used, but it the distribution will be much more sparse.

Let’s find 0x15 in the output above (one we got when XORing two cypher text in the example above, those that have letters only). There are the following options: '0x15': 'A xor T; B xor W; C xor V; D xor Q; E xor P; F xor S; G xor R; L xor Y; M xor X; O xor Z'. So, as you may see, one of them is P xor E (or E xor P, or e xor p, or p xor e). This could be a good start in case we had such a problem. In the script above the characters in the array can be changed to any set of values depending on the character set composing the original message.

So, too summarize, if we have two cypher texts encrypted with the same key using one-time pad, here are some tricks to help get the plaintext if not full at least partly. We need to XOR the two messages and observe each byte in the result.

If the byte is 0x0 (binary 0000 0000), then the bytes at this position in both plain texts are the same. If it’s a consecutive series of bytes at the end or at the beginning, might be a name.
If the byte starts with 01.. .... in binary, it’s likely that one of the messages contains a space and the other - a character at this position.
If the byte starts with 00.. .... in binary, it’s likely that both messages have a letter character at this position (in case it’s ASCII) and these letters are different (see the first tip). Also, the script output above can help narrow down the number of options for each byte and if we know the language, we can sometimes deduce the whole word from its part or position in the text.

Example. Coursera

Spoiler alert!

There are 7 messages, each 31 characters long. They are encrypted using a one-time pad cypher with the same key. That allows us to crack the message. An alternative method to do it is to perform the frequency analysis. In fact, this problem (if the key is the same for all 7 messages) comes down to cracking a Viginear cypher with a longer key. Technically this is no longer a one-time pad since the key is 7 times shorter than the message. I say the message because I see all these 7 ones as one message space and its length is 217. However, since the key is the same, the key space, in this case, is 31 characters only.

# cypher texts
t1 = "BB3A65F6F0034FA957F6A767699CE7FABA855AFB4F2B520AEAD612944A801E"
t2 = "BA7F24F2A35357A05CB8A16762C5A6AAAC924AE6447F0608A3D11388569A1E"
t3 = "A67261BBB30651BA5CF6BA297ED0E7B4E9894AA95E300247F0C0028F409A1E"
t4 = "A57261F5F0004BA74CF4AA2979D9A6B7AC854DA95E305203EC8515954C9D0F"
t5 = "BB3A70F3B91D48E84DF0AB702ECFEEB5BC8C5DA94C301E0BECD241954C831E"
t6 = "A6726DE8F01A50E849EDBC6C7C9CF2B2A88E19FD423E0647ECCB04DD4C9D1E"
t7 = "BC7570BBBF1D46E85AF9AA6C7A9CEFA9E9825CFD5E3A0047F7CD009305A71E"

# all end with a 1E (.?) and 0F - ! or ?
# Most likely, the first characters are capital letters
a1 = [int(t1[i:i+2], 16) for i in range(0,len(t1), 2)]
a2 = [int(t2[i:i+2], 16) for i in range(0,len(t2), 2)]
a3 = [int(t3[i:i+2], 16) for i in range(0,len(t3), 2)]
a4 = [int(t4[i:i+2], 16) for i in range(0,len(t4), 2)]
a5 = [int(t5[i:i+2], 16) for i in range(0,len(t5), 2)]
a6 = [int(t6[i:i+2], 16) for i in range(0,len(t6), 2)]
a7 = [int(t7[i:i+2], 16) for i in range(0,len(t7), 2)]

a1xora2 = []
a1xora3 = []
a1xora4 = []
a1xora5 = []
a1xora6 = []
a1xora7 = []

# XOR each message with the first one. I could have used any other message but I've chosen the first one.
for i in range(0, len(a1)):
    a1xora2.append(bin(a1[i] ^ a2[i]).lstrip('0b'))
    a1xora3.append(bin(a1[i] ^ a3[i]).lstrip('0b'))
    a1xora4.append(bin(a1[i] ^ a4[i]).lstrip('0b'))
    a1xora5.append(bin(a1[i] ^ a5[i]).lstrip('0b'))
    a1xora6.append(bin(a1[i] ^ a6[i]).lstrip('0b'))
    a1xora7.append(bin(a1[i] ^ a7[i]).lstrip('0b'))

# normalize the output to a 8-bit by appending 0 at the digit start    
for i in range(0, len(a1xora2)):
    a1xora2[i] =  "0" * (8 - len(a1xora2[i])) + a1xora2[i]
    a1xora3[i] =  "0" * (8 - len(a1xora3[i])) + a1xora3[i]
    a1xora4[i] =  "0" * (8 - len(a1xora4[i])) + a1xora4[i]
    a1xora5[i] =  "0" * (8 - len(a1xora5[i])) + a1xora5[i]
    a1xora6[i] =  "0" * (8 - len(a1xora6[i])) + a1xora6[i]
    a1xora7[i] =  "0" * (8 - len(a1xora7[i])) + a1xora7[i]
    
# Analyze the result of XOR operation. If the binary digit starts with 01 - the most likely case is that either of the two is a space.    
for i in range(0, len(a1xora2)):
    if a1xora2[i][0:2] == "01":
        print(Purple + "Most likely either {}th char of cypher1 or {}th char of cypher2 is a space".format(i+1, i+1))
    if a1xora3[i][0:2] == "01":
        print(Purple + "Most likely either {}th char of cypher1 or {}th char of cypher3 is a space".format(i+1, i+1))
    if a1xora4[i][0:2] == "01":
        print(Purple + "Most likely either {}th char of cypher1 or {}th char of cypher4 is a space".format(i+1, i+1))
    if a1xora5[i][0:2] == "01":
        print(Purple + "Most likely either {}th char of cypher1 or {}th char of cypher5 is a space".format(i+1, i+1))
    if a1xora6[i][0:2] == "01":
        print(Purple + "Most likely either {}th char of cypher1 or {}th char of cypher6 is a space".format(i+1, i+1))
    if a1xora7[i][0:2] == "01":
        print(Purple + "Most likely either {}th char of cypher1 or {}th char of cypher7 is a space".format(i+1, i+1))
    if a1xora2[i][0:2] == "00":
        print(Cyan + "Most likely both {}th char of cypher1 and {}th char of cypher2 is a char or both are spaces".format(i+1, i+1))
    if a1xora3[i][0:2] == "00":
        print(Cyan + "Most likely both {}th char of cypher1 and {}th char of cypher3 is a char or both are spaces".format(i+1, i+1))
    if a1xora4[i][0:2] == "00":
        print(Cyan + "Most likely both {}th char of cypher1 and {}th char of cypher4 is a char or both are spaces".format(i+1, i+1))
    if a1xora5[i][0:2] == "00":
        print(Cyan + "Most likely both {}th char of cypher1 and {}th char of cypher5 is a char or both are spaces".format(i+1, i+1))
    if a1xora6[i][0:2] == "00":
        print(Cyan + "Most likely both {}th char of cypher1 and {}th char of cypher6 is a char or both are spaces".format(i+1, i+1))
    if a1xora7[i][0:2] == "00":
        print(Cyan + "Most likely both {}th char of cypher1 and {}th char of cypher7 is a char or both are spaces".format(i+1, i+1))

# 1. All the first characters are no spaces
# 2. The first and the fifth message start with the same character
# 3. There can't be more than one space between letters (assumption)
# 4. If I see "Most likely either 5th char of cypher1 or 5th char of cypherX is a space" for several X, then I should check whether hex chars at this position for all cypher text other than the first one are the same.
# Otherwise, cypher1 has the space!

a2xora3 = []
a2xora4 = []
a2xora5 = []
a2xora6 = []
a2xora7 = []

# key
#        1     2      3    4      5     6      7    8     9     10    11  12  13    14    15    16   17     18    19    20    21    22    23    24    25  26   27   28    29   30  31
key = [0xF2, 0x1A, 0x04, 0x9b, 0xd0, 0x73, 0x23, 0xc8, 0x39, 0x98, 0xce, 0x9, 0xe, 0xbc, 0x86, 0xda, 0xc9, 0xe0, 0x39, 0x89, 0x2a, 0x5f, 0x72, 0x67, 0x83, 0xa5, 0x61, 0xfd, 0x25, 0xee, 0x30]

result1 = ""
result2 = ""
result3 = ""
result4 = ""
result5 = ""
result6 = ""
result7 = ""

for i in range(0, len(key)):
    result1 += chr(a1[i] ^ key[i])
    result2 += chr(a2[i] ^ key[i])
    result3 += chr(a3[i] ^ key[i])
    result4 += chr(a4[i] ^ key[i])
    result5 += chr(a5[i] ^ key[i])
    result6 += chr(a6[i] ^ key[i])
    result7 += chr(a7[i] ^ key[i])

# output the result    
print(result1)
print(result2)
print(result3)
print(result4)
print(result5)
print(result6)
print(result7)

Example. Solid

Another example of why there needs to be a key rota in place. Imagine, we have a key and we use the same key to encrypt user passwords.

m1 XOR key1 = c1
m2 XOR key1 = c2 
m3 XOR key1 = c3

Now, if I create a password and get an encrypted value in return, I can XOR it with the password in plain text to get the key and subsequently use it to decrypt others’ keys.

m1 XOR key1 = c1 
c1 XOR m1 = key1
m2 = c2 XOR key1

How is it different from the previous and much more complex example? In the first example had NO plaintext for the first operation: m1 XOR key1 = c1, only cyphertexts.

References

[1] Coursera, by University of Maryland, College Park