Very interesting project!
I wonder if there's a way to make this set of techniques less brittle and more applicable to any language. I guess you're looking at a new backend or some enhancements to one of the parser generator tools.
I have applied a subset of these techniques in a tokenizer in C++ to parse a language syntactically similar to Swift: no inline assembly, no intrinsics, no SWAR but reduce branching, cache optimization and SIMD parsing + explicit vectorization.
I get:
- ~4 MLOC/sec/core on a laptop
- ~ 8-9MLOC/sec/core on a modern AMD sever grade CPU with AVX512.
So yes, it is definitively possible.