I think you’re fixating on the very specific example. Imagine if instead of 2 + 2 it was multiplying arrays of large matrices. The compiler or runtime would be smart enough to figure out if it’s worth dispatching the parallelism or not for you. Basically auto vectorisation but for parallelism
Notably - in most cases, there is no way the compiler can know which of these scenarios are going to happen at compile time.
At runtime, the CPU can figure it out though, eh?
I mean, theoretically it's possible. A super basic example would be if the data is known at compile time, it could be auto-parallelized, e.g.
int buf_size = 10000000;
auto vec = make_large_array(buf_size);
for (const auto& val : vec)
{
do_expensive_thing(val);
}
this could clearly be parallelised. In a C++ world that doesn't exist, we can see that it's valid.If I replace it with int buf_size = 10000000; cin >> buf_size; auto vec = make_large_array(buf_size); for (const auto& val : vec) { do_expensive_thing(val); }
the compiler could generate some code that looks like: if buf_size >= SOME_LARGE_THRESHOLD { DO_IN_PARALLEL } else { DO_SERIAL }
With some background logic for managing threads, etc. In a C++-style world where "control" is important it likely wouldn't fly, but if this was python...
arr_size = 10000000
buf = [None] * arr_size
for x in buf:
do_expensive_thing(x)
could be parallelised at compile time. Which no one really does (data is generally provided at runtime). Which is why ‘super smart’ compilers kinda went no where eh?