Compression and Encryption: Order Matters

By | January 15, 2016

Compression then encryption or the other way around?

Sometimes we need to compress a file in order to make it easier to transmit. Sometimes we need to encrypt the contents of a file in order to protect that information from prying eyes. Sometimes we need to apply both compression and encryption to a file. At first, one may think that because these operations are independent of one another that it makes no difference with the order that they are applied to a file.

However, the order in which these operations are applied makes a huge difference. Encryption has the effect of turning data into high-entropy data where entropy is a measure of unpredictability of information content. Therefore the encrypted data appears like a random array of bytes and patterns are far less likely. Compression algorithms work best when there are patterns in the data so that these patterns can be represented with fewer bytes. For example, the LZW (Lempel-Ziv-Welch) compression algorithm builds a dictionary as it compresses data, so when strings of data are repeated they are substituted with an index from the dictionary. When repetition in data is rare, then there are few opportunities for a beneficial substitution to occur. In essence, the entropy of data relates directly to the incompressibility of data.

Not Recommended: Encryption then Compression

If you compress data first, then your compression algorithm has a much better chance at encountering patterns which it can utilize to shrink down data. Encryption algorithms will produce the same length of data as they consume (with maybe a few extra padding bytes).

Recommended: Compression then Encryption

Example in .NET

Encrypt Then Compress

Compress Then Encrypt

Here is a sample application which uses the methods above.

Where CreateFile is a helper method which generates a file of 5000 lines of "abcdefabcdefabcdefabcdefabcdefabcdef" and PrintFileInfo is a helper method which prints the file name, file size and a SHA256 hash of the file.

Below is a screenshot of the output.

Compression Encryption Sample Screenshot

The source file is 190000 bytes. After applying compression then encryption the resulting file is 608 bytes in size. After applying encryption then compression the resulting file is 190094 bytes in size. Not only is the latter file larger than the former, but it is actually larger than the original source file! I hope that this example makes clear how encrypting data will make it less compressible. Encryption worth its salt should transform the source data into something indistinguishable from random data and therefore the data appearing random is something that a compression algorithm cannot be effective on.

Source code can be downloaded here.

Leave a Reply

Your email address will not be published. Required fields are marked *