I didn’t quite get the method used to „compress“ the data from the article, maybe this rephrasing helps someone:
You basically split the file every time you encounter a specific character, and your compressor just combines all files it finds with the character you split by. If you split at every „X“ Char which might occur 1000 times in the file, the compressor only needs a small script which joins all files and puts an „X“ between them, which is less than 1000 bytes. The „trick“ is storing the location of the Xs you removed in the file sizes of the individual files.
Almost.
The trick is the missing character is static, in the 'decompressor'. It's inserted between every segment which was trimmed to end where that character existed in the original.dat file. The numbers at the end of each segment file correspond to the segment number, as bourne shells increment numbers. It does not handle errors or validate the output.
It does fulfill the discussed terms of the challenge, which fail to include any reference to an external set of rules.
My take was that the information is stored in the ordering of the files. The decompressor doesn't care about the file size of each file, right?
Both are needed. If you want to transmit this solution from one computer to another you need to store the size of each file (or insert a fancy delimiter that takes even more space).