> This isn't true! If you scrambled the filenames
I said "could" because you'd have to either do a limited scramble or hotwire ls to use the right order despite the scrambling. Or sort by date or inode, probably.
> The key thing here is that the trick works by storing the information of how the blobs are ordered out-of-band.
Yes. That is the key, not the filenames.
> In the OP, that out-of-band place to store the blob order is filename.
It is, but the actual use of filenames is not a shenanigan, and the blob order could be easily accomplished without any particular filenames.
> In your JS example of `[...s].join('5')`, where does the order of [...s] come from? It's not something you can hand-wave away, it's the key thing that makes the trick work.
It comes from the process of loading the blobs onto the computer. I'm not trying to hand-wave it, I'm saying it doesn't need filenames or anything resembling filenames. Maybe it came from a tar. Maybe I sent each file in a separate email. All that matters is having an order, and having an order happens by default when you have multiple files. As long as you don't go out of your way to reorder things, the trick works.
> It comes from the process of loading the blobs onto the computer. I'm not trying to hand-wave it, I'm saying it doesn't need filenames. Maybe it came from a tar. Maybe I sent each file in a separate email. All that matters is having an order, and having an order happens by default when you have multiple files. As long as you don't go out of your way to reorder things, the trick works.
I guess that's where we disagree. I think you don't have an order by default, you need to explicitly define it, and transmit it, and store it somehow. Which is after all, why it's not true compression. When you account for that metadata, the "compressed" data is not smaller than the original.
In the OP, the cheat was using filenames to store that data. In a tar file, it's using the tar file metadata to store it. In your email, you're storing the email metadata to keep that ordering. In all cases, order is a key thing that you need to explicitly define, transmit, and store. And in all cases, this metadata takes up more space than is saved by the whole scheme.
> I guess that's where we disagree. I think you don't have an order by default, you need to explicitly define it, and transmit it, and store it somehow.
It's files on a computer. Those always have an order. Acting like there isn't an order takes active work.
> Which is after all, why it's not true compression. When you account for that metadata, the "compressed" data is not smaller than the original.
Yeah sure, I have never disagreed on this.
> In the OP, the cheat was using filenames to store that data.
Let me make my argument extra clear, and you can tell me if you disagree with either point, and exactly how you disagree with that point:
A) OP did not do anything untoward with filenames. They did a simple loop and even threw away the actual contents of the filename. Even code that wasn't cheating would have a similar use of filenames.
B) OP's trick is not "based on" filenames, it's based on having an order. There are many ways to have an order, and their choice of using filenames is very shallowly integrated into their code.
Sure, I disagree with this:
B) OP's trick is not "based on" filenames, it's based on having an order. There are many ways to have an order, and their choice of using filenames is very shallowly integrated into their code.
I think this is a distinction without a difference. OP's trick is based on having an order via filenames. They could have used a different trick that used something besides filenames for ordering, but they didn't.
If instead of asking if he could use multiple files, he asked if he could use a tar file, or email each file separately and specified that they should be fed to the decompressor in order, he would likely have been declined outright because the cheat is more obvious.
Okay, well I can respect that interpretation enough. I don't quite see it that way but I don't think I'm going to convince you.
Specifically I don't think it rises to the level of violating rules on filenames. That's why I think the distinction can matter.
The rules should bar contestents from saving entropy outside the payload of the file. Whether that's in a file name or some file system data structure or in some timing side channel is insubstantial.
And once you ban that, it's impossible from an information theoretic point to win the challenge.