l33t7332273 5 days ago

One thing you can do, as the other commenter pointed out, is consider entropy of the file.

However, this restriction is too much for the purposes of this challenge. We don’t actually need a file with low entropy, in fact I claim that a weak file exists for files with entropy 8 (the maximum entropy value) - epsilon for each epsilon > 0.

What we actually need is a sufficiently large chunk in a file to have low entropy. The largeness is in absolute terms, not relative terms.

A very simple file would be taking a very large file with maximum entropy and adding 200 0’s to the end. This would not decrease the entropy of the file much, but it gives way to a compression algorithm that should be able to save ~100 bytes

1
kevinventullo 4 days ago

Note that if this large chunk occurs in the middle of the file, then you will need extra space to encode that position. For example, a random bit string of length 2^n is decently likely to have a run of n zeroes. But this doesn’t help you because you need n bits just to encode where that run happens.

l33t7332273 4 days ago

But storing an index for a file of length 2^n takes only n bits, so you need that run of 0’s to be of length n+1 to win