Square CTF 2017: Ciphercel

Ciphercel - Crypto (100 points)
● Crypto inside Google Sheets
● An office mouse scurried through the ceilings of Evil Robot Corp and spotted an unlocked computer. She shared this Google Sheet with us before she had to flee. Can you decipher it?
● Solves: 52
● Download: https://docs.google.com/spreadsheets/d/14Rnn9mAOEGvJSD3_VUR32edmGAT21AVhh8vTqHRADjs/edit#gid=0

I put off doing this one because reverse engineering Excel formulae sounded like a nightmare, but it ended up being totally fun. Also I’m pretty sure I solved it in an insane way – it sounds like the challenge creator(s?) intended this challenge to be solved by hand.

We’re presented with a Google Sheets document (reference copy here in case the official one disappears) containing four cells of note: key input and flag output, and a sample ciphertext and its decryption using the key input:

OK, so we’re meant to somehow figure out the correct key, which will give us both a correct decryption of the sample ciphertext and the level’s flag. Let’s have a look at what’s going on behind the scenes by undoing the white-on-white cell coloring for the guts of the cipher:

We’re working with 6 columns of data here (A through F), plus two special cells (D50 and E50). Most are cut off, but there’s a row for each character of ciphertext. Working through what’s happening in each column:

A is the (1-indexed) index of ciphertext characters.
B is each letter in the alphabet, non-repeating.
C is =FIND(B51,D50): the index (1-indexed) of the current row’s cell in column B within the previous row’s cell in column D.
D is =CONCAT(LEFT(D50,C51), SUBSTITUTE(MID(D50,C51+1, LEN(D50)), B51, "")): a mildly annoying to parse formula that forms the state generation function of the key derivation algorithm.
E and F are, respectively, the ciphertext and plaintext split into individuals chars.
F’s formula is the actual encryption algorithm: =IFERROR(MID($E$50, FIND(E51, $E$50)+20, 1), E51)

D50’s formula is =CONCAT(LEFT(REGEXREPLACE(LOWER(B4), "[^a-z]", ""), 10), "abcdefghijklmnopqrstuvwxyz"): The input key (cell B4), with all alpha chars made lowercase and all non-alpha characters removed, cut to the 10 left-most characters, then with the entire alphabet concatenated.

E50’s formula is D76 (the final state of key derivation) concatenated with itself (duplicated). E50 is the actual key used in encryption.

Here’s what I wrote about the key derivation algorithm (D) during the RE process:

to update the state, we need to generate d_left and d_right.
d_left is the first c chars of the current state
d_right is the remainder of the chars (position c to the end)
with all occurrences of b deleted.

we update the state with the concatenation of d_left and d_right

And here’s what I wrote about the encryption algorithm:

to encrypt, we operate char by char.
we first get the index of the first occurrence of the plaintext char
in the master key, add 20 to that index, and call it findval.
the encrypted char is the findvalth character of e50

That’s all pretty stream-of-consciousness from the midst of flow, so it might be easier to look at the Python in my solution, or to just stare at the Excel formulae yourself.

Using that understanding of key derivation and encryption, I implemented both in Python so I could write a script to search the keyspace for me. The search would involve checking whether a given key decrypts the ciphertext successfully. It was easy to figure out the target plaintext: passing the ciphertext through quipqiup (mostly) solves it – it’s the first sentence of the Recursive Set Wikipedia article.

ta kzxwgpdjtctpf prozif, d nop zs adpgidc agxjoin tn kdccom iokgintho, kzxwgpdjco zi moktmdjco ts proio tn da dcyzitprx ertkr poixtadpon dspoi d statpo dxzgap zs ptxo (ertkr xdf mowoam za pro ythoa agxjoi) dam kziiokpcf moktmon eroproi d ythoa agxjoi joczayn pz pro nop.
In computability theory, a set of natural numbers is called recursive, computable or decidable if there is an algorithm which terminates after a finite amount of time (which may depend on the given number) and correctly decides whether a given number belongs to the set.

Since it seems that the key/message space isn’t 1:1 (that is, a given decrypt can be achieved with multiple different keys), I thought I’d first try a bit of exhaustive search. The key is truncated to 10 chars max, and 26¹⁰ is a bit too big (roughly 2⁴⁸) for a full exhaustive search, but I looped through one through four chars in an hour or so with no luck. At this point I was pretty sure that some manual differential cryptanalysis is the intended solution, but I didn’t have any great ideas about how to go about this, and just messing around by hand wasn’t getting me very far. I think the best I could do was getting “in co” as the first four chars of plaintext before I gave up on that approach.

Since this cipher has very poor avalanche properties, and we have a known-plaintext/ciphertext pair, we can still bruteforce without a full exhaustive search of the keyspace. All we have to do is try all 26 values in each key position sequentially, scoring each trial decrypt, and keeping the highest-scoring (closest to the correct decryption) value. I implemented this, but this search method would immediately get stuck on a decently-scoring (~60% correct) key. I puzzled through how to get around this before deciding on trying something probably technically very stupid. I remembered watching a talk at some point about Quantum Annealing, a QC concept that involves using quantum effects to ‘break through’ local minima in the process of finding a solution. What I ended up doing:
● Loop through the char-by-char scored search method described above
● Keep track of the last 2 attempts’ scores
● If the scores are the same, randomly select one key char and change it to a random character
● Keep looping until we get a perfect score of 270 matching chars

I added some rudimentary progress-bar style score visualizing. Here’s what it looks like!

Here’s my code.

I don’t really get most of this stuff but from some Wikipedia skimming it seems like Simulated Annealing is closer to what I’m actually doing. Anyway, I parallelized this by running the script on as many cores as I had quick access to, and found the key (and flag) within about 15 minutes. Using PyPy gave me a speedup of roughly 3x, by the way, with no additional effort required!

I feel a little guilty for solving this in a ridiculous way, but I mostly just had a lot of fun. I definitely didn’t expect my method to work so I was very excited when it did.

flag-npghqlawyr

tk

security & networking

Square CTF 2017: Ciphercel

Leave a Reply Cancel reply