First I would like to thank you for bein patient with me. After some contemplation, I think I've identified what aspect of my argument hasn't been properly justified, which causes this kind of discussion.
Let's first define C as the set of all algorithms that are computable by any Turing complete system.
The main attractive feature of Turing Completeness is specifically this universality. You can take an algorithm running on one Turing Complete system and port it to another, with some amount of translation work (often just a compiler)
Now let's define the subset of all algorithms C that we are not to properly parallelize, let's label it U. (U is a subset of C).
The complementary subset of C, that CAN be parallelized properly we label P (P is also a subset of C).
Now define algorithms that require a lot of compute (>= 10^20 steps or so) as L. L is also a subset of C.
The complementary ("small" computations) can be labelled S (< 10^20 steps, though the exact cutoff is a bit arbitrary).
Now we define the intersections of S, L, U, P:
(Edit: changed union to intersect)
S_P (intersect of S and P)
L_P (intersect of L and P)
S_U (intersect of S and U)
L_U (intersect of L and U)
For S_P and S_U, the advantages of Turing Completeness remains. L_U is going to be hard to compute on any Turing Complete system.
(Edit: The mistake in my earlier argument was to focus on L_U. L_U is irrelevant for the relevance of the universality of Turing Complete systems, since no such system can run such calculations in a reasonable amount of time, anyway. To run algorithms in the L_U domain would require either some fundamental breakthrough in "single threaded" performance, Quantum Computing or some kind of magic/soul, etc)
This leaves us with L_P. There are computations/algorithms that CAN be parallelized, at least in principle. I will only discuss these from here on.
My fundamental claim is as follows:
While algorithms/computations that belong to the L_P set ARE in theory computable on any Turing Complete system, the variation in how long it takes to compute them can vary so much between different Turing Complete systems that this "universality" stops having practical relevance
For instance, let's say two computers K1 and K2 can run one such algorithm (lp_0, for instance) at the same speed. But on other algorithms (lp_1 and lp_2) the difference between how fast those system can run the computation can vary by a large factor (for instance 10^6), often in both directions.
Let's say lp_1 is 10^6 times faster on K1 than on K2, while lp_2 is 10^6 faster on K2 than on K1.
(Edit: note that lp_2 will take (10^6)^2 = 10^12 times longer than lp_1 on K1)
While both these algorithms are in theory computable on both K1 and K2, this is now of very little practical importance. You always want to run lp_1 on K1 and lp_2 on K2.
Note that I never say that either K1 or K2 are (Edit) not Turing complete. But the main attraction of Turing Completeness is now of no practical utility, namely the ability to move an algorithm from one Turing Complete system to another.
Which also means that what you really care about is not Turing Completeness at all. You care about the ability to calculate lp_1 and lp_2 within a reasonable timeframe, days to years, not decades to beyond eons.
And this is why I'm claiming that this is a paradigm shift. The Turing Completeness ideas were never wrong, they just stopped being useful in the kind of computations that are getting most of the attention now.
Instead, we're moving into a reality where computers are to an ever greater degree specialized for a single purpose, while the role of general purpose computers is fading.
And THIS is where I think my criticism of Deutsch is still accurate. Or rather, if we assume that the human brain belongs to the L_P set and strongly depend on it's hardware for doing what it is doing, this creates a strong constraint on the types of hardware that the human mind can conceivably be uploaded to.
And vice versa. While Deutsch tends to claim that humans will be able to run any computation that ASI will run in the future, I would claim that to the extent a human is able to act as Turing Complete, such computations may take more than a lifetime and possibly longer than the time until the heat death of the universe.
And I think where Deutsch goes wrong, is that he thinks that our conscious thoughts are where our real cognition is going on. My intuition is that while our internal monologue operates at around 1-100 operations per second, our subconscious is requiring in the range of gigaflops to exaflops for our minds to be able to operate in real time.
So, the reason I think the argument on Turing completeness matters here, is that if we accept that an LLM and a brain are both Turing complete, then while you're right that there can be Turing complete systems that are so different in performance characteristics that some are entirely impractical (Turings original UTM is an example of a Turing complete system that is too slow for practical use), if they are both Turing complete the brain is then both an existence-proof for the ability of Turing machines to be made to reason and gives us an upper limit in terms of volume and power needed to achieve human-level reasoning.
It may take us a long time to get there (it's possible we never will), and it may take significant architectural improvements (so it's not a given current LLM architectures can compete on performance), but if both are Turing complete (and not more) then dismissing the human ability to do so is
It means those who dismiss LLM's as "just" next token predictors assuming that this says something about the possibility of reasoning don't have a leg to stand on. And this is why the Turing completeness argument matters to me. I regularly get in heated arguments with people who get very upset at the notion that LLMs can possibly ever reason - this isn't a hypothetical.
> And vice versa. While Deutsch tends to claim that humans will be able to run any computation that ASI will run in the future, I would claim that to the extent a human is able to act as Turing Complete, such computations may take more than a lifetime and possibly longer than the time until the heat death of the universe.
If you mean "act as" in the sense of following operations of a Turing-style UTM with tape, then sure, that will be impractically slow for pretty much everything. Our ability to do so (and the total lack of evidence that we can do anything which exceeds Turing completeness) just serves as a simple way of proving we are Turing complete. In practice, we do most things in far faster ways than simulating a UTM. But I agree with you that I don't think humans can compete with computers in the long run irrespective of the type of problem.
Ok, so it looks like you think you've been arguing against someone who doubt that LLM's (and similar NN's) cannot match the capabilities of humans. In fact, I'm probably on the other side from you compared to them.
Now let's first look at how LLM's operate in practice:
Current LLM's will generally run on some compute cluster, often with some VM layer (and sometimes maybe barebone), followed by an OS on each node, and then Torch/TensorFlow etc to synchronize them.
It doesn't affect the general argument if we treat the whole inference system (the training system is similar) as one large Turing Complete system.
Since the LLM's have from billions to trillions of weights, I'm going to assume that for each token produced by the LLM it will perform 10^12 FP calculations.
Now, let's assume we want to run the LLM itself as a Turing Machine. Kind of like a virtual machine INSIDE the compute cluster. A single floating point multiplication may require in the order of 10^3 tokens.
In other words, by putting 10^15 floating point operations in, we can get 1 floating point operation out.
Now this LLM COULD run any other LLM inside it (if we chose to ignore memory limitations). But it would take at minimum in the order of 10^15 times longer to run than the first LLM.
My model of the brain is similar. We have a substrate (the physical brain) that runs a lot of computation, one tiny part of that is the ability that trained adults can get to perform any calculation (making us Turing Complete).
But compared to the number of raw calculations required by the substrate, our capability to perform universal computation is maybe 1 : 10^15, like the LLM above.
Now, I COULD be wrong in this. Maybe there is some way for LLM's to achieve full access to the underlying hardware for generic computation (if only the kinds of computations other computers can perform). But it doesn't seem that way for me, neither for current generation LLM's nor human brains.
Also, I don't think it matters. Why would we build an LLM to do the calculations when it's much more efficient to build hardware specifically to perform such computations, without the hassle of running it inside an LLM?
The exact computer that we run the LLM (above) on would be able to load other LLM's directly instead of using an intermediary LLM as a VM, right?
It's still not clear to me where this is not obvious....
My speculation, though, is that there is an element of sunk cost fallacy involved. Specifically for people my (and I believe) your age that had a lot of our ideas about these topics formed in the 90s and maybe 80s/70s.
Go back 25+ years, and I would agree to almost everything you write. At the time computers mostly did single threaded processing, and naïve extrapolation might indicate that the computers of 2030-2040 would reach human level computation ability in a single thread.
In such a paradigm, every computer of approximately comparable total power would be able to run the same algorithms.
But that stopped being the case around 10 years ago, and the trend seems to continue to be in the direction of purpose-specific hardware taking over from general purpose machines.
Edit: To be specific, the sunk cost fallacy enters here because people have been having a lot of clever ideas that depend on the principle of Turing Completeness, like the ability to easily upload minds to computers, or to think of our mind as a barebone computer (not like an LLM, but more like KVM or a Blank Slate), where we can plug in any kind of culture, memes, etc.