This impl is mentioned in TFA.. It's much slower and includes branches.
I'd expect even without optimizations on, there wouldn't be branches in the output for that code.
There are, even with optimizations on. You could have checked: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename...
I didn't find any way to get a compiler to generate a branchless version. I tried clang and GCC, both for amd64, with -O0, -O5, -Os, and for clang, -Oz.