andrepd 1 day ago

This impl is mentioned in TFA.. It's much slower and includes branches.

1
hoten 1 day ago

I'd expect even without optimizations on, there wouldn't be branches in the output for that code.

kragen 1 day ago

There are, even with optimizations on. You could have checked: https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename...

I didn't find any way to get a compiler to generate a branchless version. I tried clang and GCC, both for amd64, with -O0, -O5, -Os, and for clang, -Oz.

mmozeiko 1 day ago

If you change logic and/or to bitwise and/or then it'll be branchless.