mhandley 4 days ago

You could do the same spitting trick but only split at progressively increasing file lengths at the character '5'. The "compression" would be worse, so you'd need a larger starting file, but you could still satisfy the requirements this way and be independent of the filenames. The decompressor would just sort the files by increasing length before merging.

2
gus_massa 3 days ago

Nice idea, but doesn't this require a linear increase of the length of the partial files and a quadratic size of the original file?

If the length of a file is X, then in the next file you must skip the first X characters and look for a "5" that in average is in the X+128 position. So the average length of the Nth file is 128*N and if you want to reduce C bytes the size of the original file should be ~128C^2/2 (instead of the linear 128*C in the article).

Dylan16807 3 days ago

It does, but that's fine. He only needs to save 150-200 bytes and the file is 3MB. 128*200²/2 is 2.5MB Though I think it would be 256* here? Still, it can be made to work.

mhandley 3 days ago

Yes, I think it is quadratic. I don't claim it's practical (the original isn't practical either though), but just that the dependency on filenames isn't fundamental.

gus_massa 3 days ago

> I don't claim it's practical

Don"t interpret my comment as a complain, I was just triying to understand.

This is reposted from time to time, but I don't remember someome proposing this trick before. It's a nice idea.

anamexis 4 days ago

That's a neat idea.