Compression then encryption or the other way around?
Sometimes we need to compress a file in order to make it easier to transmit. Sometimes we need to encrypt the contents of a file in order to protect that information from prying eyes. Sometimes we need to apply both compression and encryption to a file. At first, one may think that because these operations are independent of one another that it makes no difference with the order that they are applied to a file.
However, the order in which these operations are applied makes a huge difference. Encryption has the effect of turning data into high-entropy data where entropy is a measure of unpredictability of information content. Therefore the encrypted data appears like a random array of bytes and patterns are far less likely. Compression algorithms work best when there are patterns in the data so that these patterns can be represented with fewer bytes. For example, the LZW (Lempel-Ziv-Welch) compression algorithm builds a dictionary as it compresses data, so when strings of data are repeated they are substituted with an index from the dictionary. When repetition in data is rare, then there are few opportunities for a beneficial substitution to occur. In essence, the entropy of data relates directly to the incompressibility of data.
Not Recommended: Encryption then Compression
If you compress data first, then your compression algorithm has a much better chance at encountering patterns which it can utilize to shrink down data. Encryption algorithms will produce the same length of data as they consume (with maybe a few extra padding bytes).
Recommended: Compression then Encryption
Example in .NET
Encrypt Then Compress
private static void EncryptThenCompress(string inputFileName, string outputFileName, ICryptoTransform encryptor) { using (var inputFileStream = new FileStream(inputFileName, FileMode.Open, FileAccess.Read)) { using (var outputFileStream = new FileStream(outputFileName, FileMode.Create, FileAccess.Write)) using (var gZipStream = new GZipStream(outputFileStream, CompressionMode.Compress)) using (var cryptoStream = new CryptoStream(gZipStream, encryptor, CryptoStreamMode.Write)) { inputFileStream.CopyTo(cryptoStream); } } } private static void DecompressThenDecrypt(string inputFileName, string outputFileName, ICryptoTransform decryptor) { using (var inputFileStream = new FileStream(inputFileName, FileMode.Open, FileAccess.Read)) { using (var gZipStream = new GZipStream(inputFileStream, CompressionMode.Decompress)) using (var cryptoStream = new CryptoStream(gZipStream, decryptor, CryptoStreamMode.Read)) using (var outputFileStream = new FileStream(outputFileName, FileMode.Create, FileAccess.Write)) { cryptoStream.CopyTo(outputFileStream); } } }
Compress Then Encrypt
private static void CompressThenEncrypt(string inputFileName, string outputFileName, ICryptoTransform encryptor) { using (var inputFileStream = new FileStream(inputFileName, FileMode.Open, FileAccess.Read)) { using (var outputFileStream = new FileStream(outputFileName, FileMode.Create, FileAccess.Write)) using (var cryptoStream = new CryptoStream(outputFileStream, encryptor, CryptoStreamMode.Write)) using (var gZipStream = new GZipStream(cryptoStream, CompressionMode.Compress)) { inputFileStream.CopyTo(gZipStream); } } } private static void DecryptThenDecompress(string inputFileName, string outputFileName, ICryptoTransform decryptor) { using (var inputFileStream = new FileStream(inputFileName, FileMode.Open, FileAccess.Read)) { using (var cryptoStream = new CryptoStream(inputFileStream, decryptor, CryptoStreamMode.Read)) using (var gZipStream = new GZipStream(cryptoStream, CompressionMode.Decompress)) using (var outputFileStream = new FileStream(outputFileName, FileMode.Create, FileAccess.Write)) { gZipStream.CopyTo(outputFileStream); } } }
Here is a sample application which uses the methods above.
private const string OriginalFileName = "Original.txt"; private const string CompressThenEncryptFileName = "CompressThenEncrypt.txt"; private const string EncryptThenCompressFileName = "EncryptThenCompress.txt"; private const string DecompressThenDecryptFileName = "DecompressThenDecrypt.txt"; private const string DecryptThenDecompressFileName = "DecryptThenDecompress.txt"; static void Main(string[] args) { Console.Title = "Compression-Encryption Sample"; CreateFile(OriginalFileName); PrintFileInfo(OriginalFileName); using (var aes = new AesCryptoServiceProvider()) { ICryptoTransform encryptor = aes.CreateEncryptor(); ICryptoTransform decryptor = aes.CreateDecryptor(); ICryptoTransform decryptor2 = aes.CreateDecryptor(); // Compress and Encrypt CompressThenEncrypt(OriginalFileName, CompressThenEncryptFileName, encryptor); PrintFileInfo(CompressThenEncryptFileName); EncryptThenCompress(OriginalFileName, EncryptThenCompressFileName, encryptor); PrintFileInfo(EncryptThenCompressFileName); // Decrypt and Decompress DecompressThenDecrypt(EncryptThenCompressFileName, DecompressThenDecryptFileName, decryptor); PrintFileInfo(DecompressThenDecryptFileName); DecryptThenDecompress(CompressThenEncryptFileName, DecryptThenDecompressFileName, decryptor2); PrintFileInfo(DecryptThenDecompressFileName); } Console.WriteLine("Press any key to continue..."); Console.ReadKey(); }
Where CreateFile is a helper method which generates a file of 5000 lines of "abcdefabcdefabcdefabcdefabcdefabcdef"
and PrintFileInfo is a helper method which prints the file name, file size and a SHA256 hash of the file.
Below is a screenshot of the output.
The source file is 190000 bytes. After applying compression then encryption the resulting file is 608 bytes in size. After applying encryption then compression the resulting file is 190094 bytes in size. Not only is the latter file larger than the former, but it is actually larger than the original source file! I hope that this example makes clear how encrypting data will make it less compressible. Encryption worth its salt should transform the source data into something indistinguishable from random data and therefore the data appearing random is something that a compression algorithm cannot be effective on.
Source code can be downloaded here.
hi there . nice one. but i have experienced a strange thing. if i use MemoryStream instead of FileStream to get the content i need to compress i encrypt, the result is very different : 50+k instead of the 18 K with fileStream. why ?