Neat idea but chances are the length of the seed is equal to the length of the original file.
There is a polynomial expression that will match the file. I wonder if expressing it would be compressible to a lower size than original file? [edit] I’m wrong - not all sequences can be expressed with a polynomial.
This reminds me of a data compression scheme I came up with once:
Treat an n bit file as a polynomial over the finite field with characteristic 2. Now, there are some irreducible polynomials in this field, but many polynomials have factors of x and of (x+1). Factor the polynomial into P(x)x^n (x+1)^m. and just collect these terms, storing only P, n, and m.
That's not how seeds work. Seeds are tiny.
Actually this would work perfectly if you knew it was generated in a single pass by a known random number generator and you had tons of time to brute force it.
If the file were generated by a natural source of entropy then forget it.
Or even if modified in a trivial way like adding 1 to every byte.
What is with the many downvotes but no comments? Everything I said is factual.
Seeds are something like 32 bits, though it depends on the exact implementation. But not the length of files.
When implementing a PRNG, you can make its seed as big as you want. There is no mathematical law that dictates or limits the size of a seed.
But I assume the GGP assumes that the author is lazy and used a public available PRNG instead of a custom made. (A long time ago someone broke the login security check in HN using a trick like that. Obviously, it's already fixed.)
I mean sure you could in theory, but in practice that's not how common built-in random number generators work.
I was responding to:
> chances are the length of the seed is equal to the length of the original file
And why would the chances be that? You'd really have to go out of your way for that. I don't even know if there are libraries that can handle a seed and state length on the scale of megabytes. No, chances are 99.99+% it used a seed of a few bytes, because that's how common random number generators designed for efficiency work.
A PRNG with a 32 bit seed can only generate up to 2^32 different sequences of 32bit numbers, which are guaranteed to repeat after 2^32 generated numbers. Without doing the math, Even for a 3mb file chances are very slim there'll be a seed that generates the entire file. The number of unique 3mb files is astronomically larger than the number of sequences you can generate, even if you'd try every possible substring of all the sequences. So you need a PRNG that can generate more sequences, which you can only achieve by making the seed larger. What the person you were replying to was implying is that the size of the seed might approach the size of the sequence you're trying to generate.