...yet
I'm not sure how it'll ever make sense unless you need a lot of customizations or care a lot about data leaks.
For small guys and everyone else.. it'll probably be cost neutral to keep paying OpenAi, Google etc directly rather than paying some cloud provider to host an at best on-par model at equivalent prices.
> unless you need a lot of customizations or care a lot about data leaks
And both those needs are very normal. "Customization" in this case can just be "specializing the LLM on local material for specialized responses".
I've tried self hosting. It is quite difficult, and either you are limited to low models, either you need a very expensive setup. I couldn't run this model on my gaming computer.
If I try other models, I basically end up with a very bad version of AI. Even if I'm someone who uses Anthropic APIs a lot, it's absolutely not worth it to try and self host it. The APIs are much better and you get much cheaper results.
Self hosting for AI might be useful for 0.001% of people honestly.