Tools For Unpacking Malware, Part 2. Weak encryption algorithms
Some days ago I started a series of posts about tools and methods for unpacking malware, here you can find the first part. Each malware/packer is very different, and sometimes there are no generic ways to unpack them. But sometimes we can find characteristics shared by a number of them. For example, packers usually rely on weak encryption algorithms and, sometimes, it is possible to attack them.
In this article I am going to write about some methods based on very simple cryptography to attack these weak encryption algorithms used by packers. In spite of the simplicity of these attacks, they are, still nowadays, very effective against a significant number of malware families and packers, and they can help to unpack automatically some samples.
First, I must say this is not new, it has been being used for long time. Here there are some articles and tools:
I implemented a python tool (revealpe.py) performing these attacks. I tried to put some of these attacks together in a tool, and I tried to do some improvements, specially oriented to malware unpacking:
The tool is able to attack some algorithms based on XOR/ADD/ROL, with keys of 8 or 32 bits (I think it is the most common in malware/packers), with or without key increments each round. In addition, it performs an specific attack against vigenere-like sustitution ciphers that let’s to find some more complex encryptions.
The script applies all the specified attacks on the target file. It could get matches with different algorithms and keys. It will create files for each result, whose names will have this format:
The extension .dec is for the file raw decrypted with the algorithm and key found. The extension .decpe is for the decrypted and extracted valid PE, if found.
For example, for the sample f658526e1227c45415544063997b49c8, we obtain these results:
Match was found at position 0x1f60. The results mean it can be decrypted with xor_byte, starting key 0x88, and increment 0xff. A result is got with xor_dword, starting key 0x85868788, increment 0xfbfbfbfc, but .decpe is not generated, so the alg and key found was valid for the given plaintext but not for the full PE (probably it is a problem of alignment, and with the option –check-unaligned it would get the good key for xor_dword algorithm).
Here you can watch a short video showing the usage of this tool:
Here there are some examples of weak algorithms that it is possible to attack.
XOR or ADD (Constant Key)
P xor K = C -> K = C xor P
P add K = C -> K = C sub P
Initial key: 0x85, no increment.
XOR or ADD (Incremented Key, Constant Increment)
Pi xor Ki = Ci -> Ki = Ci xor Pi -> K(i) – K(i-1) = Kinc
Pi add Ki = Ci -> Ki = Ci sub Pi -> K(i) – K(i-1) = Kinc
Initial key: 0x54, Increment: 0x92.
Ci = SUST_TABLE[ Pi ]
Indexes of positions with same values in plaintext, must be identical to indexes of positions with same values in ciphertext, and we can create signatures based on distance between same values to identify the position in the ciphertext for the given plaintext.
Once a ciphertext for the plaintext is found, it is possible to start to reconstruct a partial sustitution table: (T -> 73), (h -> A2), (i -> A6), (s -> FC), (space -> C5), etc… (In addition to parts of the PE header, revealPE.py tool searchs for other wellknown plaintext to complete a bit more the sustitution table, for example, common API names).
In addition to create a partial sustitution table, revealPE.py bruteforces some common algorithms by checking for each algorithm if it fits with the reconstructed partial sustitution table (revealPE.py bruteforces these algorithms: xor-add-rol , xor-rol-add, add-xor-rol, add-rol-xor, rol-xor-add, rol-add-xor, with all the possible keys).
I did some tests with a collection of exe files and other collection of pdf files:
- 93 exe files with embbeded-encrypted PE found.
- 4453 exe files with no embbeded-encrypted PE found.
- 586 pdf files with encrypted PE found.
- 44536 pdf files with no encrypted PE found.
The files were randomly choosen for the tests. Probably, most of them are not packed files or they don’t contain any embedded/encrypted PE. For example, revealPE didn’t find embedded PEs in 4453 exe files, but it doesn’t mean it failed for this number of files, because probably most of them don’t contain embedded PEs.
For a more accurate test I would need a collection of files that I am sure they contain embedded PE files, to know exactly how many PEs revealPE was able to decrypt.
The list of samples tested can be found here: