ball_of_lint 4 days ago

This strategy or something like it legitimately wins the challenge.

The challenge (based on expected value) is not to find some compression scheme that will compress 100% or even 50% of files to less than their original size. Instead it's to find any (set of) compression schemes that will compress more than 1/50 = 2% of arbitrary random files to less than their original size. You can construct such a program essentially as they described by asking for a large file size and then searching for a valid decompression program within that file. With standard tools you can make some valid decompression program quite short, which will allow you a reasonable probability that it will appear within a large randomly-generated file.

The cost that you pay for using a compression scheme like this is that where the compression scheme works you'll win some bytes, but in the more common case where it doesn't you'll lose some, even for the simple `cat $1 > original` default decompressor.

Now, a clever contest creator might arbitrarily choose files that win against such decompressors that they can think of, but Mike appears to have not have done so.

3
LegionMammal978 3 days ago

The problem is what you do outside the file: even if it contains a working decompressor, you have to actually run that decompressor, which most likely requires specifying the byte offset and doing some other work to extract it. And then the decompressor needs to shorten the file even past the length of that byte offset and extractor, which can't be done with reasonable probability.

There are some formats which search for a valid header anywhere in the file, but then they'd get tripped up by all the mostly-valid headers that don't actually yield a decompressed file.

hinkley 3 days ago

No. You sound as if you’re familiar with the culture but it’s clear that you’re not.

The problem with comp.compression was always that its Eternal September is new people showing up every week claiming they’ve found a universal compression algorithm that compresses everything - the perpetual motion machine of information theory. Having gotten tired of explaining to people who think they’re fighting “dogma” not the laws of the universe, people start trying to find other ways to make them put up or shut up, so they could have some peace and quiet.

Like sending them a file full of RNG output and wait for them to realize this was their teachable moment.

Winning the challenge - without engaging in out of band bullshit (compressor + output should be the giveaway) doesn’t prove you have found a universal compression algorithm. It only proves the RNG is flawed. Which would be very interesting but not break Shannon.

The problem is compressor + file means “no out of band data” to a reasonable person and we have already established we are dealing with unreasonable people.

Not

> My main motivation was to "out-trick the tricker". I thought the chances of me making any money were very remote.

ball_of_lint 2 days ago

I'm not familiar with the culture.

Nothing in the challenge says anything about doing universal compression. In fact the key point I'm trying to make is that you _don't_ have to make a universal compressor or break entropy to win, instead just find some scheme that makes 2% of random files shorter than the original, even if it makes 98% of files much much longer. The overall entropy is increased. I wouldn't use this for anything else. But it does beat the challenge as stated.

I'd agree that Patrick didn't follow the "spirit" of the challenge or do anything interesting with compression. But in doing this I think he made a good point, which is roughly that you need to be very careful and explicit when posing challenges like this or people are going to use your sloppy wording against you.

hinkley 1 day ago

> Nothing in the challenge says anything about doing universal compression

Because it’s out of context. It also makes the winner seem more sympathetic, which is why it’s not accident it’s not provided. It’s the Missing, Missing Reason.

ttshaw1 3 days ago

If they're trying to dissuade "universal compressors" then Mike needed to ask for the algorithm first, and then generate his file. If you tell me "I bet you can't compress this file!" then I can do whatever I want to write some stupid one-off compressor to shave a byte off and take your money.

Dylan16807 3 days ago

It's a limited risk. Even if the file is compressible by one byte, it's very unlikely you can figure out how to get a decompressor functioning without plenty of bytes of overhead. And even if that problem disappears, he'd still win 99.6% of the time.

And you can get rid of that risk by requiring 100 bytes of shrink. Just measure the size right.

hinkley 1 day ago

As someone else said, if the reward was a million dollars then this becomes a game theory problem.

If you have $100 in disposable income it might be worth the lark if the officiant is uncareful with their chosen text. Though odds are good he pulled it from random.org. That’s where we used to send people.

ball_of_lint 2 days ago

I think it would be better to require a meaningful percentage (say 1%) of compression rather than an exact count of bytes. Especially while people can ask for arbitrarily large files.

Dylan16807 2 days ago

The chance that a randomly generated 1KB file can shrink by 100 bytes is the same as the chance that a randomly generated 100MB file can shrink by 100 bytes.

And that chance is too low to distinguish from zero before the universe dies.

hinkley 3 days ago

I'm not saying 'Mike' had a high-effort ask, because clearly it wasn't particularly well thought out. Just that other people were glad for any Mike at all.

GuB-42 3 days ago

For a truly random file, your 1/50 compression scheme that will make it smaller than the original will only make it smaller by 5 or 6 bits, maybe more if you are lucky, but it is an exponential, so realistically, you won't be able to save more than a byte or two, and you can't write that decompressor in two bytes.

The only way to win is to reverse the random number generator used to create the contest file, or to exploit some bias if there is one. But if the file is generated properly, most likely with some CSPRNG, good luck going that route.

The contest creator has to use a random file, any attempt to be clever will actually decrease entropy and may result in the contestant winning.

teo_zero 3 days ago

> The only way to win is to reverse the random number generator used to create the contest file

In that case Mike, the contest creator, may declare the result invalid because you haven't satisfied the intent of the challenge.

fragmede 3 days ago

He may, but everyone else recognizes that he underspecified the rules of the game and in doing so is a sore loser, and to never conduct business with him because he's not a man of his word and only does things when it's convenient for him.