> Every LLM we have today is Turing complete if you put a loop around it that uses context as a means to continue the state transitions so they haven't made that sacrifice, is the point.
I don't think you understood what I was writing. I wasn't saying that either the LLM (finished product OR the machines used for training them) were not Turing Complete. I said it was irrelevant.
> It only means it needs the theoretical ability.
This is absolutely incorporated in my previous post. Which is why I wrote:
>> Specifically, the problem with Turing Completeness is that it implies the ability to create global branches/dependencies in the code based on the output of any previous computation step.
> It only means it needs the theoretical ability. They can take any set of shortcuts you want.
I'm not talking about shortcuts. When I talk about sacrificing, I'm talking about algorithms that you can run on any Turing Complete machine that are (to our knowledge) fundamentally impossible to distribute properly, regardless of shortcuts.
Only by staing within the subset of all possible algorithms that CAN be properly paralellized (and have the proper hardware to run it) can you perform the number of calculations needed to train something like an LLM.
> Every LLM we have today is Turing complete if you put a loop around it that uses context as a means to continue the state transitions so they haven't made that sacrifice,
Which, to the degree that it's true, is irrelevant for the reason that I'm saying Turing Completeness is a distraction. You're not likely to run algorithms that require 10^20 to 10^25 steps within the context of an LLM.
On the other hand, if you make a cluster to train LLM's that is explicitly NOT Turing Complete (it can be designed to refuse to run code that is not fully parallel to avoid costs in the millions just to have a single cuda run activated, for instance), it can still be just as good at it's dedicated task (training LLM)s.
Another example would be the brain of a new-born baby. I'm pretty sure such a brain is NOT Turing Complete in any way. It has a very short list of training algorithms that are constanly running as it's developing.
But it can't even run Hello World.
For it to really be Turing Complete, it needs to be able to follow instructions accurately (no halucinations, etc) and also access to infinite storage/tape (or it will be a Finite State Machine). Again, it still doesn't matter if it's Turing Complete in this context.
> I don't think you understood what I was writing. I wasn't saying that either the LLM (finished product OR the machines used for training them) were not Turing Complete. I said it was irrelevant.
Why do you think it is irrelevant? It is what allows us to say with near certainty that dismissing the potential of LLMs to be made to reason is unscientific and irrational.
> I'm not talking about shortcuts. When I talk about sacrificing, I'm talking about algorithms that you can run on any Turing Complete machine that are (to our knowledge) fundamentally impossible to distribute properly, regardless of shortcuts.
But again, we've not sacrificed the ability to run those.
> Which, to the degree that it's true, is irrelevant for the reason that I'm saying Turing Completeness is a distraction. You're not likely to run algorithms that require 10^20 to 10^25 steps within the context of an LLM.
Maybe or maybe not, because today inference is expensive, but we already are running plenty of algorithms that require many runs, and steadily increasing as inference speed relative to network size is improving.
> On the other hand, if you make a cluster to train LLM's that is explicitly NOT Turing Complete (it can be designed to refuse to run code that is not fully parallel to avoid costs in the millions just to have a single cuda run activated, for instance), it can still be just as good at it's dedicated task (training LLM)s.
And? The specific code used to run training has no relevance to the limitations of the model architecture.
See my other response above, I think I've identified what part of my argument was unclear.
The update may still have claims in it that you disagree with, but those are specific and (at some point in the future) probably testable.