hgomersall 3 days ago

Not necessarily. Consider a big file of random uniformly distributed bytes. It's easy to show that in practice some bytes are more common than others (because random), and that necessarily therefore the expected spacing between those specific bytes is less than 256, which gives you a small fraction of a bit you can save in a recoding of those specific bytes (distance from last byte of a specific value).

With a big enough file those fractions of a bit add up to a non trivial number of bits. You can be cunning about how you encode your deltas too (next delta makes use of remaining unused bits from previous delta).

I haven't worked through all the details, so it might be in the end result everything rebalances to say no, but I'd like to withhold judgement for the moment.

1
l33t7332273 3 days ago

>It's easy to show that in practice some bytes are more common than others (because random)

I don’t follow. Wouldn’t that be (because not random)

hgomersall 3 days ago

If you generate a billion bytes using a random byte generator, and bin the resultant array into 256 bins, it will not be perfectly flat. You can use that non-flatness to encode your bits more efficiently. I suspect just using codes to do it won't work well because the bin values are so close so you'll struggle to get codes that are efficient enough, but I suspect you can use the second order difference-between-specific byte as the encoded value. That has a much more pronounced distribution heavily weighted to small values.