andy granger February 2016

Compressing groups of uncompressable yet very smilar files, possible?

I have a number of files 15GB+ in size, non of them can be compressed as the content is an encrypted container.

I have many of these files where only a slight different exists between them, so 90%+ of the data is common.

Using Winrar I can set a dictionary size of 1gb, but I believe this means only 1gb of the 15gb thats common to each file will be efficiently compressed. So two files equals 29gb best case.

Does any software exist to compress multiple large and similar files.


zaph February 2016

If the files are correctly encrypted there will be no similarity in the encrypted data. A properly encrypted file is indistinguishable from random data.

If there is any similarity, even small sections, between the files the encryption incorrect and compromised. If an encryption mode such as ECB or CTR mode with the same key and nonce.

Note: If there are repeats in the encrypted data that leaks information about the underlying data, that is in general a security problem.

Mofi February 2016

Solid compression of WinRAR works different as you think. The dictionary size is just a memory space allocated for compressing similar small files used dynamically for each file.

For example I have a folder with 366 files. 30 files are text files with less than 12 KB. The others are binary files with a file size between 40 KB and 450 KB. Total size of all files is 48 MB. Solid compression with a dictionary size of just 4 MB with correct configured RarFiles.lst for those files results in a RAR archive with just 205 KB using RAR4 format. Most files are stored with less than 500 bytes in the archive including header for the file as it can be seen on opening the RAR archive in WinRAR. So although the total number of bytes is more than 10 times of dictionary size, the solid compression is nevertheless impressive. The RAR archive file can be made even smaller by using RAR5 format and use a dictionary size of 64 MB resulting in a file size of 163 KB for the solid RAR archive file.

But solid compression of WinRAR is not designed for compressing similar very large files.

The technique to best compress such files is first putting all those files into a single archive file using store for compression method, i.e. producing one huge file with uncompressed data. Then this huge archive file is compressed using normal, good or even best compression without creating a solid archive.

Note: By default WinRAR just stores files containing usually already compressed data in an archive file. So after selecting the huge RAR archive with data of all files just stored in the archive and clicking on button Add, it is necessary to remove on tab Files the file name pattern *.rar from list of files to store without compression to get the select

