Pretty much the same, I work on some fairly specific document retrieval and labeling problems. After some initial excitement I’ve landed on using LLM to help train smaller, more focused, models for specific tasks.
Translation is a task I’ve had good results with, particularly mistral models. Which makes sense as it’s basically just “repeat this series of tokens with modifications”.
The closed models are practically useless from an empirical standpoint as you have no idea if the model you use Monday is the same as Tuesday. “Open” models at least negate this issue.
Likewise, I’ve found LLM code to be of poor quality. I think that has to do with being a very experienced and skilled programmer. What the LLM produce is at best the top answer in stack overflow-level skill. The top answers on stack overflow are typically not optimal solutions, they are solutions up voted by novices.
I find LLM code is not only bad, but when I point this out the LLM then “apologizes” and gives better code. My worry is inexperienced people can’t even spot that and won’t get this best answer.
In fact try this - ask an LLM to generate some code then reply with “isn’t there a simpler, more maintainable, and straightforward way to do this?”
There have even been times where an LLM will spit out _the exact same code_ and you have to give it the answer or a hint how to do it better
Yeah. I had the same experience doing code reviews at work. Sometimes people just get stuck on a problem and can't think of alternative approaches until you give them a good hint.
> I’ve found LLM code to be of poor quality
Yes. That was my experience with most human-produced code I ran into professionally, too.
> In fact try this - ask an LLM to generate some code then reply with “isn’t there a simpler, more maintainable, and straightforward way to do this?”
Yes, that sometimes works with humans as well. Although you usually need to provide more specific feedback to nudge them in the right track. It gets tiring after a while, doesn't it?
What is the point of your argument?
I keep seeing people say “yeah well I’ve seen humans that can’t do that either.”
What’s the point you’re trying to make?
The point is that the person I responded to criticized LLMs for making the exact sort of mistakes that professional programmers make all the time:
> I’ve found LLM code to be of poor quality. I think that has to do with being a very experienced and skilled programmer. What the LLM produce is at best the top answer in stack overflow-level skill. The top answers on stack overflow are typically not optimal solutions
Most professional developers are unable to produce code up to the standard of "the top answer in stack overflow" that the commenter was complaining about, with the additional twist that most developers' breadth of knowledge is going to be limited to a very narrow range of APIs/platforms/etc. whereas these LLMs are able to be comparable to decent programmers in just about any API/language/platform, all at once.
I've written code for thirty years and I wish I had the breadth and depth of knowledge of the free version of ChatGPT, even if I can outsmart it in narrow domains. It is already very decent and I haven't even tried more advanced models like o1-preview.
Is it perfect? No. But it is arguably better than most programmers in at least some aspects. Not every programmer out there is Fabrice Bellard.
But LLMs aren’t people. And people do more than just generate code.
The comparison is weird and dehumanizing.
I, personally, have never worked with someone who consistently puts out code that is as bad as LLM generated code either.
> Most professional developers are unable to produce code up to the standard of "the top answer in stack overflow"
How could you possibly know that?
All these types of arguments come from a belief that your fellow human is effectively useless.
It’s sad and weird.
>> > Most professional developers are unable to produce code up to the standard of "the top answer in stack overflow"
> How could you possibly know that?
I worked at four multinationals and saw a bunch of their code. Most of it wasn't "the top answer in stack overflow". Was some of the code written by some of the people better than that? Sure. And a lot of it wasn't, in my opinion.
> All these types of arguments come from a belief that your fellow human is effectively useless.
Not at all. I think the top answers in stack overflow were written by humans, after all.
> It’s sad and weird.
You are entitled to your own opinion, no doubt about it.
> In fact try this - ask an LLM to generate some code then reply with “isn’t there a simpler, more maintainable, and straightforward way to do this?”
These are called "code reviews" and we do that amongst human coders too, although they tend to be less Socratic in nature.
I think it has been clear from day one that LLMs don't display superhuman capabilities, and a human expert will always outdo one in tasks related to their particular field. But the breadth of their knowledge is unparalleled. They're the ultimate jacks-of-all-trades, and the astonishing thing is that they're even "average Joe" good at a vast number of tasks, never mind "fresh college graduate" good.
The real question has been: what happens when you scale them up? As of now it appears that they scale decidedly sublinearly, but it was not clear at all two or three years ago, and it was definitely worth a try.