Contents:
The RAR archives are notorious for their strong encryption which proves to be resistant to brute-force searches, even when the passwords used are quite weak by modern standards. There are three reasons for this:
- RAR relies on well-proven cryptographic algorithms;
- Its initialization step is deliberately made very slow, so that the number of passwords one can test per second is quite low;
- WinRAR and related tools don't not have any bugs or other security problems;
The last item might seem irrelevant at first, but consider the case with .ZIP files and WinZIP version 7 or earlier: if you got an encrypted archive with more than four files, programs like ElcomSoft's Advanced Archive Password Recovery can break your password for sure, no matter how strong it is. Therefore, you should never use WinZIP for any serious security. This is all because of a flaw in WinZIP's original implementation, which seems to be fixed in recent versions. In case of WinRAR, nobody has found a bug like this as of the time of this writing.
The second item is helped a lot by the first one. The RAR encryption routine requires a key initialization phase, which is intentionally made complex and requires a lot of number crunching. Because of the first item, it cannot be short-circuited: you cannot avoid doing all of the calculations. Indeed, the algorithms used in RAR (namely: the AES encryption algorithm and the SHA-1 hashing routine) are very well studied. Making some serious progress in item #2 would require a significant cryptographic discovery. Therefore, you can safely assume the security of RAR will not be broken soon.
Let's put some numbers. Assuming your cracking speed is stuck at let's say 100 passwords per second, it would take a long time to launch even a small dictionary attack (and a reasonable one can take hours). Testing all 5-chars alphanumeric passwords takes week and a half, testing all numbers up to a billion requires four months, and doing the ultimate search: all 8-chars passwords using any printable character would take about 210 million years! (Of course, this doesn't count the effects of Moore's law. Click here for an amusing applet that takes it into account.)
If you're interested in cracking passwords at a specified length, please be aware that the speeds reported on the UnRAR-crack website pertain to 6-character passwords only; you can compute here what speed should you expect for different lengths.
Advanced Encryption Standard (AES)
AES is an encryption algorithm. It needs three things: a 128-bit key, a 128-bit initialization vector (IV) and some data. Data may be then encrypted, giving incomprehensible gibberish of the same size. The encrypted data may be then decrypted, using the same key and IV. Therefore, the only things needed to decrypt a RAR file are the key and the IV.
In order an algorithm to be viewed as cryptographically secure, you need to ensure that the key and the IV aren't visible from the gibberish cyphertext. In the case of AES, this is very well tested and no weaknesses have been found. In fact, the algorithm is so good, that it is in charge within the US government for cryptographic purposes!
For more information about the AES, please refer to the Wikipedia article.
Secure Hash Algorithm, version 1 (SHA-1)
SHA-1 is a cryptographic hash function. In layman terms, a hash function takes some data (e.g. the contents of a file), does some number crunching over it, and produces a fixed-length chunk of numbers, which can be viewed as the digital fingerprint of the input data. A property of a good hash function (like SHA-1) is that even a minor change the input data will produce a very different fingerprint. This is often used on websites where you can download some very large files (like ISO images) - there is a text file that goes along with them, which contains hashes of the large files - so you can use programs like md5sum, sha1sum, etc. to calculate the "fingerprints" of the downloaded files and then check if they match with the ones from the text file, thus verifying correct transmission. Those "fingerprints" are usually called "hash values".
In the case of a cryptographic hash function (like SHA-1), another desired property is to be extremely difficult to produce two different pieces of data with the same hash (termed "finding a collision"). We'll explain why does this matter in a moment.
Another implicit property of SHA-1 is that it takes a lot of number crunching to produce the output. This is dictated by the mathematical complexity needed to satisfy the two good properties above. So, it is kinda slow - while not really slow, it is much slower than other algorithms, which are only used to verify correct transmission - CRC for example.
RAR takes advantage of this slowness, by creating a huge string of many copies of the password, which is then fed to the SHA-1. It takes some time to calculate the hash value of this string (the hash is then used to construct the AES key and IV). This is where the collision resistance of SHA-1 is needed: if it were easy to find an small string with the same hash as the large one, you could do that instead and it will take less time find the actual hash values. However, you can't do that with SHA-1, so the only option is to do the full calculations per each password.
Note: Here we assume that the reader is familiar with the layout of strings and numbers in memory in languages like C.
Input: An encrypted RAR file header and a password to be checked
Output: AES key and IV to be used to decrypt the header.
Algorithm:
- Convert the user password to wide-char (16 bits per character), giving ExpandedPassword;
- The RAR header contains a random 7-byte string - the salt. Append this salt to ExpandedPassword giving PasswordAndSalt;
- Append a three-byte "serial number" to PasswordAndSalt giving Packet1. Those three bytes are "01 00 00" for the first packet.;
E.g. if the password is "abc", the salt is "07 cf cc a0 12 4b 85", the first packet will be "61 00 62 00 63 00 07 cf cc a0 12 4b 85 01 00 00"- Clone Packet1 262144 times (218 times), changing the serial numbers of the copies so that they read 1, 2, 3, ... in little-endian format. Concatenate all these copies;
- The resulting huge string is fed to SHA-1. The first 128 bits of the hash value are the resulting AES key;
- Take the empty string; then take the string from the first 16384 copies of Packet1; then the string from the first 2*16384 copies; then from 3*16384 copies, etc. so you have 16 different "partial" strings. For each of them, calculate their SHA-1 value and take the last byte of the digest and append it to the Initialization Vector. Thus, 16 strings give you 16 bytes of the resulting IV.
Plugging in some numbers: checking a 6-char password requires hashing of 5.7MB of data, with a subsequent AES decryption of a small encrypted header (this is fast). A CRC check is done on the header, and if it passes - the password is potentially correct.
Let's now dissect a single Packet:
- The password: makes the large string unique, thus a different password will result in a different key/IV pair
- The salt: this makes a single RAR file unique. It is randomly generated when the RAR file is created. If there were no salt, the AES key/IV pair would only depend on the password. That way, somebody could spend quite a lot of computational power to generate a list of key/IV for, e.g. all 7-chars password, and publish that list on the internet (this is called a Rainbow table). Using a rainbow table, the attacker can try keys/IVs on the RAR very quickly, since he doesn't have to compute the keys/IVs himself (the aforementioned AES decryption and CRC check is very quick compared to SHA-1 calculation). However, in the presence of the salt, a rainbow table must be 2567 times larger, making it impractical.
- The serial number: I'm not sure what the role of this one is, however my guess is that the designer of RAR was afraid that in the future someone can find a way to compute the hash of a repetitive string more quickly than of a non-repetitive string of the same length. This is a very theoretic threat that cannot be exploited with the current cryptographic arsenal.
It was noted that it is only the SHA-1 hashing that makes RAR difficult to crack - checking if a particular key/IV gets the correct CRC is quick. So if we only need key/IV to decrypt a RAR file, why not bruteforce them instead?
The answer is that we do not know which keys exactly should we test, so we must do all of them. The keys in RAR are 128-bit, giving 2128 possible keys. Combine that with another 128 bits because of the IV, resulting in 2256 possible key/IV pairs. Even if we assume we can test a billion keys/IVs per second, it will require much more than 2200 seconds to bruteforce everything. The 210 million years mentioned earlier are nothing compared to that. So there's no chance to crack a RAR through bruteforcing its AES key/IV except if we are incredibly lucky.
There is quite a bit of research around SHA-1 and AES. AES is only susceptible to the so-called side-channel attacks, which require installation of a "spy" program on the same computer where the encryption is done - i.e. the computer that generated the RAR. Since this is usually not the case, even side-channel attacks are not applicable to RAR.
SHA-1 is a bit different. Most of the cryptographic gurus are interested in producing an efficient algorithm for finding collisions. While a very efficient algorithm will also benefit RAR cracking, the current situation is that anything below the theoretic maximal complexity of 280 hash calculations is considered an enormous success. In this sense, there are some cracks - the current "breakthrough" algorithm requires "only" 252 operations, so exploiting it in the context of RAR cracking is not possible.
No official information exists if anyone has been looking in the implementation of WinRAR for potential problems. There are a lot of commercial programs that support cracking RARs, so common sense suggests that the programmers have been looking into that idea, too.
Potential problem source may be the salt generation. I haven't yet determined how is the RAR salt generated by WinRAR; though I suspect that even the simplest scheme - using a random generator seeded by system time - is sufficient to make rainbow tables pointless. However, RAR is ported to a variety of platforms. It may turn out that some of the implementations (especially for portable devices) may suffer from reduced salt randomness, which could make RARs created by such a device more easily crackable.
In spite of all the disappointing facts, there is still one fundamental source of insecurity within RAR's encryption algorithm: it would not stand the test of time. As the processing power increases, passwords will become more easily crackable, and even very strong ones would become accessible at some point in the future. Because the algorithm is fixed, there's no way to increase the time it takes to test a single passwords (the number of repetitions as described in section 3 cannot be changed - they are hardcoded). So, even considering that the RAR's encryption is strong, it could have been made stronger, by allowing the number of repetitions be specified somewhere in the RAR file as a parameter. This could allow the user to use as strong encryption as he wants; However, this remains a somewhat theoretic threat, since RAR would probably be replaced by something better by the time it becomes trivial to crack it.
So, as it was explained, the only way to bruteforce RAR passwords is by using the slow way of hashing the huge string, mentioned in section 3, and checking if the AES key/IV decrypt the header correctly. Since the SHA-1 routine takes most of the time (more than 92% according to my tests), one should concentrate his efforts in making the SHA-1 hashing as fast as possible.
While my general implementation of SHA-1 in UnRAR-crack is non-scalar, I also looked into the fastest scalar implementation of SHA-1 that was available around (scalar here means that a single routine call hashes a single string. Vector routines produce hashes for several different strings at once). After some research, I tried OpenSSL's implementation and it turned out to be two or three times as fast as the default implementation in the original unrar utility. The latter is implemented in pure C, while OpenSSL's is in pure assembly, with carefully-tuned hand-written code (actually, as an item of coding interest, it is not hand-written, but generated by a perl script. This is because the SHA-1 algorithm consists of a huge number of rounds, with each round doing some number/bit manipulation over the input data and the so-called SHA state. All 80 rounds are very similar and differ only in insignificant details like which bytes exactly from the input to mangle. The OpenSSL SHA-1's coder has optimized a few "abstract" rounds and used the perl script to clone the rounds' bodies with substituting the correct parameters in place). As a definitive proof of how well optimized it is, the benchmark shows no performance increase from hyper-threading on the Intel Core i7 processor - since one thread fills all available execution units so well, the other thread has no free resources to utilize!
The vectorized SHA-1 hashing utilizes the 128-bit SSE registers to do four SHA-1 hashes at once by using 32-bit packed instructions. This works only when testing passwords of equal length (however, in bruteforce applications, this is almost always the case). Because the lengths of the four passwords are equal, then the huge strings constructed as per section 3 are also equal in length. This means, that the SHA-1 hashing routine must do exactly the same sequence of operations over each of these strings, it is only the data that differs. So instead of using 32-bit operations (SHA-1 is fully 32-bit), we can pack four 32-bit values from the four strings, and implement pretty much the same algorithm (as in the scalar version), with only the loading phase being a bit different.
So this is the vector implementation of SHA-1 in UnRAR-crack. One could expect almost a quadruple increase in speed when compared to the scalar version. However, this is not true (as you can see in the Result Tables). There are many reasons for this:
- Some CPUs implement SSE2 using 64-bit arithmetic units. Therefore every SSE instruction actually translates into two "internal" instructions. This nearly halves the performance. Only CPUs with 128-bit SSE (e.g. the Intel Core 2 family) can take full advantage of the vectorization;
- The SSE2 instruction set is not as orthogonal as the i386 (scalar) instruction set. There are some very useful scalar instructions, which don't have a vector equivalent. A serious one are the bit rotates - every round of SHA-1 does two or three bit rotates, which have to be simulated inefficiently by using 4 instructions in vector mode (copy, shift left, shift right, and add), as opposed to a single instruction in scalar mode (ROR/ROL). Another one is the (assembly language hackers' favorite) LEA instruction, which can be easily exploited in integer code like SHA-1 to add two registers + a constant (and it is actually used in the scalar version), while the vector code has to use two or three instructions to achieve this, not accounting for the need of additional pipelining planning;
- The SSE instructions are generally lower throughput because of their higher complexity. Maintaining the four SHA states also involves some housekeeping overhead.
It is the SHA-1 routines where most of the processing time is spent when doing bruteforce password searching. The initialization step (described in section 3) takes more than 98% of the time, with the "sha1_block_data_order" function taking about 90% of the time (this is the function that compresses one or more blocks of 64 bytes, updating the SHA-1 state - i.e., the core of the SHA-1 algorithm). The rest of the time is spend in housekeeping and testing if the AES key/IV decrypt the RAR header correctly.
So it is vital to make the implementation of sha1_block_data_order as fast as possible. The vector routines are written in pure assembly, because SSE intrinsics didn't prove to be good enough (some figures: the SSSE3 version of the routine, when implemented in pure x86 assembly, achieves around 130 p/s on a 2.66GHz Core i7 with one thread, 6-chars passwords. With the same setup, GCC 4.4 with SSE intrinsics achieves around 80 p/s; similarly for Microsoft C++ 2008. Intel C++ compiler ver. 11 beats both, with around 90 p/s, but it's still way below hand-crafted code). The assembly version is not trivial to code, because the fully loop-unrolled variant of sha1_block_data_order is 2200+ lines of the aforementioned highly-repetitive SHA-1 "rounds". To alleviate the situation, I chose an approach, similar to what OpenSSL does: use a code generator (written in Python) that actually generates the rounds' bodies. The python script is not very readable, but it is easy to maintain, and also gives opportunity to test a variety of optimizations/instruction reorderings/etc.
Since the benchmark's hot spot is implemented in assembly, which is equal across all ports of UnRAR-crack the differences in performance between the various OSes should be negligible (no more than 1-2%). Therefore, the benchmark should provide consistent results, no matter which OS you use, or which compiler has been used when building it.
Future plans include creating a version of sha1_block_data_order that runs on the GPU (using either CUDA or OpenCL), so that every available computing resource of the PC is fully utilized.