Item 42225057

Retr0id • 5 days ago

Somewhere (discussed on HN) someone devised a "better-than-perfect" compressor. Most inputs get compressed (smaller than input), except for one input that does not. That one input is cryptographically impossible to find - or something along those lines.

Unfortunately I can't find the article I'm describing here, maybe someone else can? It was a long time ago so I might be misrepresenting it slightly.

phire • 4 days ago

It can't exist.

Presume this compressor guarantees the output will always be at least one byte smaller (with the exception of the impossible to find input)

So just keep running your data in a loop through the compressor. If you start with a 1MB file, it will take a maximum of a million iterations until the output shrinks down to zero bytes, which is the smallest possible file. Record how many iterations it too.

You can now extract your file by feeding a zero byte file into the decompressor and running the same number of iterations. Which You can now store every 1MB (or smaller) file in the world in just 20 bits.... But that would means there are only 1 million possible 1MB files?

Even if you put some minimum output size limitation on the compressor, say it can't produce any file less than 512 bits, the same argument applies. It's just that the numbers get bigger.

spencerflem • 5 days ago

That's how all compressors work, in that likely files (eg. ASCII, obvious patterns, etc) become smaller and unlikely files become bigger.

3 replies

Dylan16807 • 5 days ago

> likely files (eg. ASCII, obvious patterns, etc) become smaller

Likely files for a real human workload are like that, but if "most inputs" is talking about the set of all possible files, then that's a whole different ball game and "most inputs" will compress very badly.

> unlikely files become bigger

Yes, but when compressors can't find useful patterns they generally only increase size by a small fraction of a percent. There aren't files that get significantly bigger.

PaulHoule • 5 days ago

In some cases it can be certain, the ascii encoded in the usual 8 bits has fat to trim even if it is random in that space.

Retr0id • 3 days ago

Right, but the point was, the case where it became bigger was ~impossible to find.

1 reply

spencerflem • 2 days ago

Yeah good point, kinda glossed over that part of the original post. Don't think that that's possible fwiw.

IMO. the fun part of compression algorithms is that the set of files that become smaller is as narrow as possible while the set of files that become bigger is as big as possible, so _most_ files don't compress well! The trick is to get the set of files that get smaller to be just the useful files and nothing else.

Dylan16807 • 5 days ago

You could design the opposite, a system where ultra-rare files get compressed a lot but most don't compress.

But as stated, what you said is not possible. To compress by a single byte, only 1/256 files can be compressible, or you'd have multiple different inputs compressing down to the same output.