solaire_oa 2 days ago

IIUC one reason is that prompts and other data sent to 3rd party LLM hosts have the chance to be funneled to 4th party RLHF platforms, e.g. Sagemaker, Mechanical Turks, etc. So a random gig worker could be reading a .env file the intern uploaded.

1
YetAnotherNick 2 days ago

What do you mean by chance? It's clear that if users have not opted out from training the models, it would be used. If they have opted out, it wont be used. And most of the users are in first bucket.

Just because training on data is opt out doesn't mean business can't trust it. Not the best for user's privacy though.