Trivial after a substantial hardware investment and installation, configuration, testing, benchmarking, tweaking, hardening, benchmarking again, new models come out so more tweaking and benchmarking and tweaking again, all while slamming your head against the wall dealing with the mediocre documentation surrounding all hardware and software components you're trying to deploy.
Yup. Trivial.
Even my 4-year-old M1 Pro can run a quantized Deepseek R1 pretty well. Sure, full-scale productizing these models is hard work (and the average "just-make-shovels" startups are failing hard at this), but we'll 100% get there in the next 1-2 years.
Those small models suck. You need the big guns to get those "amazing" coding agents.
Local for emotional therapy. Big guns to generate code. Local to edit generated code once it is degooped and worth something.
I put it LM Studio on an old gaming rig with a 3060 TI, took about 10 minutes to start using it and most of that time was downloading a model.
If you're dealing with ITAR compliance you should have experience with hosting things on-premises.
Yes. The past two companies I've been at have self-hosted enterprise LLMs running on their own servers and connected to internal documentation. There is also Azure Cloud for Gov and other similar privacy-first ways of doing this.
But also, running LLMs locally is easy. I don't know what goes into hosting them, as a service for your org, but just getting an LLM running locally is a straightforward 30-minute task.
I'm for hire, I'll do all that for any company that needs it. Email in profile. Contract or employee, makes no difference to me.
This hasn't been my experience. Pretty easy with AWS Bedrock
Ah yes, "self host" by using a fully Amazon-managed service on Amazon's servers. How would a US court ever access those logs?
Run a vllm docker container. Yeah the assumption is you already know what hardware you need or you already have it on prem. Assuming this is ITAR stuff, you must be self hosting everything.