The original function is likely only going to be 3 instructions. xor, test, jne and only 1 of these is dependent on a previous instruction. In the "fast" version from the article there are 4 instructions with each depending on the previous instruction. I'm not surprised it lost in the benchmark.
A branch that triggers 3/4 of the time will not perform well.
Whether that matters comes down to how this function integrates into the rest of the program.
I don't think the years tested will be random. I think practically it will see long strings of the same value.