When I read these articles, I always ask myself if this is more of a joint OS-ISA issue than just an ISA problem.
Wondering if a well defined OS system, with strict enforcement of memory boundaries at the OS level and at the application level, where the application sits in a well defined deterministic execution model would mitigate some of these unpredictable state transitions.
If one considers a minimalist OS, micro kernel for example, lowering the attack surface, would this not explicitly prevent access to certain microarchitectural states (e.g., by disallowing certain instructions like clflush or speculative paths)? This could be accomplished with a strict memory management jointly at the OS layer and the binary structure of the application… one where the binary has a well defined memory memory boundary. The OS just ensures it is kept with in these limits.
> well defined deterministic execution model would mitigate some of these unpredictable state transitions.
The problem here is that giving a program access to high-resolution (non-virtualized) timers violates deterministic execution. Even without a high-resolution timer, the non-determinism inherent in shared memory parallelism can be exploited to make high-resolution timers. In short, one can use a counter thread to make a very high precision timer.
With high-resolution timers, the timing domain becomes both a potential covert channel and a universal side-channel to spy on other concurrent computations.
Good point, but still, you are leaving the user with too much leverage on the underlying architecture, again from the OS’ perspective.
They way I’m considering this is, one could provide virtual time sources, removing the high resolution timers, where the OS has more of a coarse-grained timer. Not sure the implications, but if needed, one could add jitter or randomness ( Noise ) to the virtual timer values…
This would further prevent thread from running out of sync with the resto of the threads.
Further, one could also add a stack based shared memory model, LIFO would provide a highly predictable behavior from an application perspective. If you make it per process, the shared stack would then be confined to the application. No sure if possible ( haven given deep thought ) but the stacks could be confined to specific cache lines, removing the timing differences caused by cache contention…