Item 44187091

celnardur • 2 days ago

There has been a lot of opinion pieces popping up on HN recently that describe the benefits they see from LLMs and rebut the drawbacks most of them talk about. While they do bring up interesting points, NONE of them have even mentioned the privacy aspect.

This is the main reason I can’t use any LLM agents or post any portion of my code into a prompt window at work. We have NDAs and government regulations (like ITAR) we’d be breaking if any code left our servers.

This just proves the point. Until these tools are local, privacy will be an Achilles heal for LLMs.

garyfirestorm • 2 days ago

You can always self host an LLM which is completely controlled on your own server. This is trivial to do.

3 replies

redundantly • 2 days ago

Trivial after a substantial hardware investment and installation, configuration, testing, benchmarking, tweaking, hardening, benchmarking again, new models come out so more tweaking and benchmarking and tweaking again, all while slamming your head against the wall dealing with the mediocre documentation surrounding all hardware and software components you're trying to deploy.

Yup. Trivial.

6 replies

dvt • 2 days ago

Even my 4-year-old M1 Pro can run a quantized Deepseek R1 pretty well. Sure, full-scale productizing these models is hard work (and the average "just-make-shovels" startups are failing hard at this), but we'll 100% get there in the next 1-2 years.

1 reply

whatevaa • 2 days ago

Those small models suck. You need the big guns to get those "amazing" coding agents.

1 reply

bravesoul2 • 2 days ago

Local for emotional therapy. Big guns to generate code. Local to edit generated code once it is degooped and worth something.

benoau • 2 days ago

I put it LM Studio on an old gaming rig with a 3060 TI, took about 10 minutes to start using it and most of that time was downloading a model.

jjmarr • 2 days ago

If you're dealing with ITAR compliance you should have experience with hosting things on-premises.

dlivingston • 1 day ago

Yes. The past two companies I've been at have self-hosted enterprise LLMs running on their own servers and connected to internal documentation. There is also Azure Cloud for Gov and other similar privacy-first ways of doing this.

But also, running LLMs locally is easy. I don't know what goes into hosting them, as a service for your org, but just getting an LLM running locally is a straightforward 30-minute task.

genewitch • 2 days ago

I'm for hire, I'll do all that for any company that needs it. Email in profile. Contract or employee, makes no difference to me.

blastro • 2 days ago

This hasn't been my experience. Pretty easy with AWS Bedrock

1 reply

paxys • 2 days ago

Ah yes, "self host" by using a fully Amazon-managed service on Amazon's servers. How would a US court ever access those logs?

1 reply

garyfirestorm • 2 days ago

Run a vllm docker container. Yeah the assumption is you already know what hardware you need or you already have it on prem. Assuming this is ITAR stuff, you must be self hosting everything.

celnardur • 2 days ago

Yes, but which of the state of the art models that offer the best results, are you allowed to do this with? As far as I've seen the models that you can host locally are not the ones being praised left and right in these articles. My company actually allows people to use a hosted version of Microsoft copilot, but most people don't because it's still not that much of a productivity boost (if any).

1 reply

genewitch • 2 days ago

Deepseek isn't good enough? You need a beefy GPU cluster but I bet it would be fine until the large llama is better at coding, and I'm certain there will be other large models for LLM. Now if there's some new technology around the corner, someone might be able to build a moat, but in a surprising twist, Facebook did us all a favor by releasing their weights back when; there's no moat possible, in my estimation, with LLMs as it stands today. Not even "multi-model" implementations. Which I have at home, too.

Say oai implements something that makes their service 2x better. Just using it for a while should give people who live and breathe this stuff enough information to tease out how to implement something like it, and eventually it'll make it into the local-only applications, and models.

1 reply

anonymousDan • 2 days ago

Are there any resources on how much it costs to run the full deep seek? And how to do it?

1 reply

genewitch • 2 days ago

I can fill in anything missing, i would like to go to bed but i did't want to leave anyone hanging. had to come edit a comment i made from my phone, and my phone also doesn't show me replies (i use materialistic, is there a better app?)

https://getdeploying.com/guides/run-deepseek-r1 this is the "how to do it"

https://news.ycombinator.com/item?id=42897205 posted here, a link to how to set it up on an AMD Epyc machine, ~$2000. IIRC a few of the comments discuss how many GPUs you'd need (a lot of the 80GB GPUs, 12-16 i think), plus the mainboards and PSUs and things. however to just run the largest deepseek you merely need memory to hold the model and the context, plus ~10% and i forget why +10% but that's my hedge to be more accurate.

note: i have not checked if LM Studio can run the large deepseek model; i can't fathom a reason it couldn't, at least on the Epyc CPU only build.

note too: I just asked in their discord and it appears "any GGUF model will load if you have the memory for it" - "GGUF" is like the format the model is in. Someone will take whatever format mistral or facebook or whoever publishes and convert it to GGUF format, and from there, someone will start to quantize the models into smaller files (with less ability) as GGUF.

1 reply

bogtog • 2 days ago

That's $2000 but for just 3.5-4.25 tokens/s? I'm hesitant to say that 4 tokens/s is useless, but that is a tremendous downgrade (although perhaps some smaller model would be usable)

1 reply

genewitch • 1 day ago

right, but that is CPU only, there's no "tensor cores" in a GPU getting lit up for that 4t/s. So minimum to actually run deepseek is $2000, and the max is, well it's basically whatever you can afford, based on your needs. if you're only running single prompts at any given time, you only need the number of GPUs that will fit the model plus the context (as i mentioned), at minimum your outlay is going to be on the order of $130,000 in just GPUs.

If i can find it later, as i couldn't find it last night when i replied, there is an article that explains how to start adding consumer GPUs or even 1-2 Nvidia A100 80GB GPUs to the epyc build, to speed that up. I have a vague recollection that can get you up to 20t/s or thereabouts, but don't quote me on that, it's been a while.

aydyn • 2 days ago

It is not at all trivial for an organization that may be doing everything on the cloud to locally set up the necessary hardware and ensure proper networking and security to that LLM running on said hardware.

woodrowbarlow • 1 day ago

> NONE of them have even mentioned the privacy aspect

because the privacy aspect has nothing to do with LLMs and everything to do with relying on cloud providers. HN users have been vocal about that since long before LLMs existed.