kookamamie 2 days ago

> 1. Compose the program into several threads of execution, traditionally scheduled and ran by the operating system

The step 0 is missing:

Compose the program into several lanes of execution, traditionally executed via SIMD.

This is a massive piece of performance left on the table on modern computer architectures, by assuming threading is the first manifestation of concurrency.

1
jayd16 2 days ago

SIMD has been somewhat of a massive failure in this regard. Unlike threads, most languages seem to ignore its existence and abdicate its usage to the sufficiently complex compiler.

I wish there was better author time feedback to the developer on where they're getting such a perf boost. As far as I'm aware there's no popular linting or blue squiggle to guide you in the right direction.

In games it seems like the popular pattern is to rewrite everything entirely in an entity component system framework.

kookamamie 2 days ago

Agreed completely. Most auto-vectorization approaches are hit-miss and you still cannot have big-binaries, where instruction set is decided dynamically, trivially.

ISPC comes close, but does come with a learning curve.

SleepyMyroslav 2 days ago

I would say that Highway [1] comes close. Can't say anything about ISPC because in gamedev work it never even came into consideration for multiple platforms.

1. https://google.github.io/highway/en/master/