Hedepig 5 days ago

If we do figure out how to vet these thoughts, would you call it reasoning?

1
mdp2021 5 days ago

> vet these thoughts, would you call it reasoning

Probably: other details may be missing, but checking one's ideas is a requirement. The sought engine must have critical thinking.

I have expressed very many times in the past two years, some times at length, always rephrasing it on the spot: the Intelligent entity refines a world model iteratively by assessing its contents.

Hedepig 5 days ago

I do see your point, and it is a good point.

My observation is that the models are better at evaluating than they are generating, this is the technique used in the o1 models. They will use unaligned hidden tokens as "thinking" steps that will include evaluation of previous attempts.

I thought that was a good approach to vetting bad ideas.

mdp2021 4 days ago

> My observation is that the [o1-like] models are better at evaluating than they are generating

This is very good (a very good thing that you see that the out-loud reasoning is working well as judgement),

but we at this stage face an architectural problem. The "model, exemplary" entities will iteratively judge and both * approximate the world model towards progressive truthfulness and completeness, and * refine their judgement abilities and general intellectual proficiency in the process. That (in a way) requires that the main body of knowledge (including "functioning", proficiency over the better processes) is updated. The current architectures I know are static... Instead, we want them to learn: to understand (not memorize) e.g. that Copernicus is better than Ptolemy and to use the gained intellectual keys in subsequent relevant processes.

The main body of knowledge - notions, judgements and abilities - should be affected in a permanent way, to make it grow (like natural minds can).

Hedepig 4 days ago

The static nature of LLMs is a compelling argument against the reasoning ability.

But, it can learn, albeit in a limited way, using the context. Though to my knowledge that doesn't scale well.