MattJ100 8 days ago

I agree, but only for situations where the probabilistic nature is acceptable. It would be the same if you had a large team of humans doing the same work. Inevitably misclassifications would occur on an ongoing basis.

Compare this to the situation where you have a team develop schemas for your datasets which can be tested and verified, and fixed in the event of errors. You can't really "fix" an LLM or human agent in that way.

So I feel like traditionally computing excelled at many tasks that humans couldn't do - computers are crazy fast and don't make mistakes, as a rule. LLMs remove this speed and accuracy, becoming something more like scalable humans (their "intelligence" is debateable, but possibly a moving target - I've yet to see an LLM that I would trust more than a very junior developer). LLMs (and ML generally) will always have higher error margins, it's how they can do what they do.

1
mnky9800n 8 days ago

Yes but i see it as multiple steps. Like perhaps the llm solution has some probabilistic issues that only get you 80% of the way there. But that probably already has given you some ideas how to better solve the problem. And this case the problem is somewhat intractable because of the size and complexity of the way the data is stored. So like in my example the first step is LLMs but the second step would be to use what they do as structure for building a deterministic pipeline. This is because the problem isn’t that there are ten thousand different meta data, but that the structure of those metadata are diffuse. The llm solution will first help identify the main points of what needs to be conformed to the monolithic schema. Then I will build more production ready and deterministic pipelines. At least that is the plan. I’ll write a substack about it eventually if this plan works haha.