I like the idea of trying multiple solutions.
Does it decide based on data if it should make its own ML model or fine-tune a relevant one?
Also, does it detect issues with the training data? When I was doing NLP ML models before LLMs, the tasks that took all my time were related to data cleaning, not the training or choosing the right approach.
Currently it decides whether to make its own model or fine-tune a relevant one based primarily on the problem description. The agent's ability to analyse the data when making decisions is pretty limited right now, and something we're currently working on (i.e. let the agent look at the data whenever relevant, etc).
I guess that kind of answers your second question, too: it does not currently detect issues with the training data. But it will after the next few pull requests we have lined up!
And yes, completely agree about data cleaning vs. model building. We started from model building as that's the "easier" problem, but our aim is to add more agents to the system to also handle reviewing the data, reasoning about it, creating feature engineering jobs, etc.