Item 42232488

mhandley • 4 days ago

You could do the same spitting trick but only split at progressively increasing file lengths at the character '5'. The "compression" would be worse, so you'd need a larger starting file, but you could still satisfy the requirements this way and be independent of the filenames. The decompressor would just sort the files by increasing length before merging.

gus_massa • 3 days ago

Nice idea, but doesn't this require a linear increase of the length of the partial files and a quadratic size of the original file?

If the length of a file is X, then in the next file you must skip the first X characters and look for a "5" that in average is in the X+128 position. So the average length of the Nth file is 128*N and if you want to reduce C bytes the size of the original file should be ~128C^2/2 (instead of the linear 128*C in the article).

2 replies

Dylan16807 • 3 days ago

It does, but that's fine. He only needs to save 150-200 bytes and the file is 3MB. 128*200²/2 is 2.5MB Though I think it would be 256* here? Still, it can be made to work.

mhandley • 3 days ago

Yes, I think it is quadratic. I don't claim it's practical (the original isn't practical either though), but just that the dependency on filenames isn't fundamental.

1 reply

gus_massa • 3 days ago

> I don't claim it's practical

Don"t interpret my comment as a complain, I was just triying to understand.

This is reposted from time to time, but I don't remember someome proposing this trick before. It's a nice idea.

anamexis • 4 days ago

That's a neat idea.