Such as what? What can GOFAI do well that LLMs still cannot?
I think logical reasoning - so reasoning about logical problems, especially those with transitive relations like two way implication. A way round this is to get them to write prolog relations and then reason over them... with prolog. This isn't a fail - it's what things like prolog do, and not what things like nns do. If I was asked to solve these problems I would write prolog too.
I think quite a lot of planning.
I think scheduling - I tried something recently and GPT4 wrote python code which worked for very naive cases but then failed at any scale.
Basically though - trusted reasoning. Where you need a precise and correct answer LLM's aren't any good. They fail in the limit. But where you need a generally decent answer they are amazing. You just can't rely on it.
Whereas GOFAI you can, because if you couldn't the community thew it out and said it was impossible!
This has always been the case with ML.
ML is good at fuzzy stuff, where you don't have a clear definition of a problem (what is spam? what is porn?), "I know it when I see it", or when you don't have a clear mathematical algorithm to solve the problem (think "distinguishing dogs and cats").
When you have both (think sorting arrays, adding numbers), traditional programming (and that includes Prolog and the symbolic AI stuff) is much better.
LLMs will always be much worse than traditional computer programs at adding large numbers, just as traditional computer programs will always be worse at telling whether the person in the image is wearing proper safety equipment.
For best results, you need to combine both. Use LLMs for the "fuzzy stuff", converting imprecise English or pictures into JSON, Python, Wolfram, Prolog or some other representation that a computer can understand and work with, and then use the computer to do the rest.
Let's say you're trying to calculate how much proteins there are per 100 grams of a product, you have a picture of the label, but the label only gives you proteins per serving and the serving size in imperial units. The naive way most programmers try is to ask an LLM to give them proteins per 100g, which is obviously going to fail in some cases. The correct way is to ask the LLM for whatever unit it likes, and then do the conversion on the backend.
I guess that's a fine distinction I don't make. If the problem requires the AI to write a prolog program to solve, and it is capable of writing the necessary prolog code, then I don't see the practical or philosophical difference from taking the transitive step and saying the AI solved it. If I asked you to solve an air traffic control problem and you did so by writing prolog, no one would try to claim you weren't capable of solving it.
Agentic LLMs can solve complicated reasoning and scheduling problems, by writing special-purpose solutions (which might resemble the things we call GOFAI). It's the nature of AGI--which LLMs assuredly are--that they can solve problems by inventing specialized tools, just as we do.
Can you show us a log from when you gave an LLM a scheduling problem or something and it decided to solve it with Prolog or Z3 or something?
On mobile so I’m not sure how to export a chat log, but the following prompts worked with ChatGPT:
1: I need to schedule scientific operations for a space probe, given a lot of hard instrument and schedule constraints. Please write a program to do this. Use the best tool for the job, no matter how obscure.
2: This is a high-value NASA space mission and so we only get one shot at it. We need to make absolutely sure that the solution is correct and optimal, ideally with proofs.
3: Please code me up a full example, making up appropriate input data for the purpose of illustration
I got an implementation that at first glance looks correct using the MiniZinc constraint solver. I’m sure people could quibble, but I was not trying to lead the model in any way. The second prompt was because the first generated a simple python program, and I think it was because I didn’t specify that it was a high value project that needed mission assurance at the start. A better initial prompt would’ve gotten the desired result on the first try.
"Tried to address" is not the same as "can do well."
I was responding to PP, but some other (maybe obvious?) examples are logical reasoning and explainability.
As PP suggests, some of the classical symbolic ideas may be applicable or complementary to current approaches.
See also (Lenat's final paper):
D. Lenat, G. Marcus, "Getting from Generative AI to Trustworthy AI: What LLMs might learn from Cyc", https://arxiv.org/abs/2308.04445
SAT solving, verification and model checking, automated theorem proving, planning and scheculing, knowledge representation and reasoning. Those are fields of AI research where LLMs have nothing to offer.
I can ask Claude 3.7 to write me a program that does SAT solving, theorem proving, or scheduling, and it generally gets it right on the first try.
You could ask me, and I'll copy'n'paste for you Z3 solver, for example, stripping copyrights & rearranging code. Without any understanding how this thing work.
It will be impressive, if Claude was trained on scientific literature about SAT solvers and tutorials about programming language in question, without access to any real SAT solver code. But it is not the case.
Why do you need LLM-generated code when you can take original, which was consumed by LLM?
Or, I could ask another question: Could Claude give you SAT solver which will be 1% more effective than state-of-art in the area? We don't need another mediocre SAT solver.
I have asked Claude to solve scientific problems which I absolutely know are not in its training data, and it has done so successfully. I am not sure why people think it is only regurgitating training data. Don't be lazy and learn how it works--LLMs do generate generalized models, and employ them to solve previously unseen problems.
Demonstrate.
It would take you all of 5 seconds to try in Claude yourself. I do this work on a daily basis; I know its value.
Do you mean you create SAT solvers with Claude on a daily basis? What is the use case for that?
I ask Claude to solve problems of similar complexity on a daily basis. A SAT solver specifically is maybe a once a week thing.
Use cases are anything, really. Determine resource allocation for a large project, or do Monte Carlo simulation of various financial and risk models. Looking at a problem that has a bunch of solutions with various trade-offs, pick the best strategy given various input constraints.
There are specialized tools out there that you can pay an arm and a leg for a license to do this, or you can have Claude one-off a project that gets the same result for $0.50 of AI credits. We live in an age of unprecedented intelligence abundance, and people are not used to this. I can have Claude implement something that would take a team of engineers months or years to do, and use it once then throw it away.
I say Claude specifically because in my experience none of the other models are really able to handle tasks like this.
Edit: an example prompt I put here: https://news.ycombinator.com/item?id=43639320
Talk is cheap. The bottom line is that I don't see any SAT solvers that you generated with Claude.