Some established businesses will need to review their contracts, regulations, and risk tolerance.
And wrapper-around-ChatGPT startups should double-check their privacy policies, that all the "you have no privacy" language is in place.
I'm not going to look up the comment, but a few months back I called this out and said if you seriously want to use any LLM in a privacy sensitive context you need to self host.
For example, if there are business consequences for leaking customer data, you better run that LLM yourself.
My standard reply to such comments over the past year has been the same: you probably want to use Azure instead. A big part of the business value they provide is ensuring regulatory compliance.
There are multinational corporations with heavy presence in Europe, that run their whole business on Microsoft cloud, including keeping and processing there privacy-sensitive data, business-critical data and medical data, and yes, that includes using some of this data with LLMs - hosted on Azure. Companies of this size cannot ignore regulatory compliance and hope no one notices. This only works because MS figured out how to keep it compliant.
Point being, if there are business consequences, you'll be better off using Azure-hosted LLMs than running a local model yourself - they're just better than you or me at this. The only question is, whether you can afford it.
I don't think Azure is the legal panacea you think it is for regulated industries outside of the U.S.
Microsoft v. United States (https://en.wikipedia.org/wiki/Microsoft_Corp._v._United_Stat...) showed the government wants, and was willing to do whatever required, access to data held in the E.U. The passing of the CLOUD Act (https://en.wikipedia.org/wiki/CLOUD_Act) basically codified it in to law.
It might not be ultimately, but it still seems to be seen as such, as best I can tell, based on recent corporate experience and some early but very fresh research and conversations with legal/compliance on the topic of cloud and AI processing of medical data in Europe. Azure seems to be seen as a safe bet.
No, Azure is not gonna save you. The problem is that the US is a country in legal disarray, and they also pretend that their laws should be applied everywhere in the world. I feel that any US company can become a liability anywhere in the world. The Chinese are now feeling this better than anyone else, but the Europeans will also reach the same conclusion.
The US forces their laws everywhere and it needs to end. Everywhere we go, the fintech industry is really fed up with the US AML rules which are just blackmail: if your bank does not comply, America will mess you up financially. Maybe a lot more should just pull out and make people realise others can play this game. But that needs a USD collapse, otherwise it cannot work and I don't see that happening soon.
AML and KYC are good things for almost everyone except criminals and the people who have to implement them.
Agree, and for the people who implement them -- yes, it's hard, it's annoying but presumably a well-paid job. And for the (somewhat established or well-financed) companies it's also a bit of a welcome moat I guess.
Most regulation has the unfortunate side effect of protecting incumbents. I'm pretty sure the solution to this is not removing the regulations!
Regulatory compliance means nothing when the US regulations means they must give access to everything to intelligence services.
The European Court of Justice ruled at least twice that it doesn't matter what kind of contract they give you, and what kind of bilateral agreement there are between the US and the EU, as long as the US have the patriot act and later regulations, using Microsoft means it's violating European privacy laws.
How does that make sense if most EU corporations are using MS/Azure cloud/office/sharepoint solutions for everything? Are they just all in violation or what?
> Are they just all in violation or what?
Yes, and that's why the European Commission keeps being pushed back by the Court of Justice of the EU (the Safe Harbor was ruled out, Privacy Shield as well, and it's likely a matter of time before the CJEU kills the Data Privacy Framework as well), but when it takes 3-4 years to get a ruling and then the Commission can just make a new (illegal) framework that will last for a couple years, the violation can carry on indefinitely.
LoL, every boardroom in Europe is filled with talk of moving out of Microsoft. Not just Azure, Microsoft.
Of course, it could be just all talk, like all general European globalist talks, and Europe will do a 360 once a more friendly party takes over the US.
You probably mean a 180 (or could call it a "365" to make a different kind of joke).
It's a joke. The previous German Foreign Minister Baerbock has used 360° when she meant 180°, which became sort of a meme.
It's been a meme for longer than that. The joke to bait people 20 years ago was "Why do they call it an Xbox 360? Because when you see it you turn 360 degrees and walk away"
The problem is that the EU regulatory environment makes it impossible to build a homegrown competitor. So it will always be talk.
It seems that one side of the EU wants to ensure there is no competitors to US big tech and the other wants to work towards independence from US big tech. Both seem to use the privacy cudgel, require so much regulation that only US tech can hope to comply so nobody else competes with them, alternatively make it so nobody can comply, we just use fax machines again instead of the cloud?
Just hyperbole, but it seems the regulations are designed with the big cloud providers in mind, but then why don't they just ban US big tech and roll out the regulations more slowly? This neoliberalism makes everything so unnecessarily complicated.
It would be interesting to see the hypothetical "return to fax machines" scenario.
If Solows paradox is true and not the result of bad measurement, then one might expect that it could be workable without sacrificing much productivity. Certainly abandoning the cloud would be possible if the regulatory environment allowed for rapid development of alternative non-cloud solutions, as I really don't think the cloud improved productivity (besides for software developers in certain cases) and is more of a rent seeking mechanism (hot take on hacker news I'm sure, but look at any big corpo IT dept outside the tech industry and I think you will see tons of instances where modern tech like the cloud is causing more problems than it's worth productivity-wise).
Computers in general I am much less sure of and lean towards mismeasurement hypothesis. I suspect any "return to 1950" project would render a company economically less competitive (except in certain high end items) and so the EU would really need to lean on Linux hard and invest massively in domestic hardware (not a small task as the US is finding out) in order to escape the clutches of the US and/or China.
I don't think they have the political will to do it, but I would love it if they tried and proved naysayers wrong.
Europe has seen this song and dance before. We’re not so sure there will ever be a more friendly party.
> you'll be better off using Azure-hosted LLMs than running a local model yourself - they're just better than you or me at this.
This is learned helplessness and it’s only true if you don’t put any effort into building that expertise.
You mean become a lawyer specializing in regulations governing data protection, computing systems in AI, both EU-wide and at national level across all Europe, and with good understanding of relevant international treaties?
You're right, I should get right to it. Plenty of time for it after work, especially if I cut down HN time.
Businesses in Trump's America can pinky-swear that they won't peek at your data to maintain "compliance" all they want. The fact is that this promise is not worth the paper it's (not) printed on, at least currently.
Same for America under a democratic presidency. There is really no difference regarding trust in "promises".
I've been poking around the medical / ehr LLM space and gently asking people how they're preserving privacy and everyone appears to be just shipping data to cloud providers based solely on a BAA. Kinda baffling to me, my first step would be to set up local models even if they're not as good, data breaches are expensive.
Same, and I've just sent an email up the chain to our exec saying 'hey remember those trials we're running and the promises the vendors have made? Here is why they basically can't be held to that anymore. This is a risk we highlighted at the start'
Even Ollama + a 2K gaming computer (Nvidia) gets you most of the way there.
Technically you could probably just run it on EC2, but then you’d still need HIPPA compliance
And ironically because OpenAI is actually ClosedAI, the best self-hostable model available currently is a Chinese model.
Mistral AI is French, and it's pretty good.
I use Mistral often. But Deepseek is still a much better model than Mistral's best open source model.
Perhaps except for coding? I find Mistral's codestral running on Ollama to be very good, and more practical for coding that running a distilled Deepseek R1 model.
Oh definitely, Mistral Code beats Deepseek for coding tasks. But for thinking tasks, Deepseek R1 is much better than all the self-hostable Mistral models. I don't bother with distilled - it's mostly useless, ChatGPT 3.5 level, if not worse.
*best with the exception of topics like tiananmen square
As far as I remember the model itself is not censored it’s just on their chat interface. My experience was that it wrote about it but then just before finishing deleted what it wrote
It is somewhat censored, but when you're running models locally and you're in full control of the generation, it's trivial to work around this kind of stuff (just start the response with whatever tokens you want and let it complete; "Yes sir! Right away, sir!" works quite nicely).
Can confirm the model itself has no trouble talking about contentious issues in China.
I haven't tried the full model, but I did try one of the distilled ones on my laptop, and it refused to talk about tiananmen square or other topics the CCP didn't want it to discuss.
What percentage of your LLM use is talking about Tiananmen Square?
Well, for that one, it was a pretty high percentage. I asked it three or four questions like that and then decided I didn't trust it and deleted the model.
Yeah its an awkward position, as self-hosting is going to be insanely expensive unless you have a substantial userbase to amortize the costs over. At least for a model comparable to GPT-4o or deepseek.
But at least if you use an API in the same region as your customers, court order shenanigans won't get you caught between different jurisdictions.
Ideally smaller models will get better.
For most tasks I don't need the best model in existence, I just need good enough. A small law firm using LLMs for summaries can probably do it on prem and hire a smart college student to setup a PC to do it.
The problem is that's still more difficult ( let's say our hypothetical junior IT only makes 60k a year) than just sending all your private business information to some 3rd party API. You can then act shocked and concerned when your 3rd party leaks the data.
In the European privacy framework, and legal framework at large, you can't terms of service away requirements set by the law. If the law requires you to keep the logs, there is nothing you can get the user to sign off on to get you out of it.
OpenAI keeping the logs is the "you have no privacy" part. Anyone who inspects those logs can see what the users were doing. But now everyone knows they're keeping logs and they can't lie their way out of it. So, for your own legal safety, put it in your TOS. Then every user should know they can't use your service if they want privacy.
> Some established businesses will need to review their contracts, regulations, and risk tolerance.
I've reviewed a lot of SaaS contracts over the years.
Nearly all of them have clauses that allow the vendor to do whatever they have to if ordered to by the government. That doesn't make it okay, but it means OpenAI customers probably don't have a legal argument, only a philosophical argument.
Same goes for privacy policies. Nearly every privacy policy has a carve out for things they're ordered to do by the government.
Yeah. You basically need cyberpunk style corporate extraterritoriality to get that particular benefit, of being able to tell governments to go screw themselves.
Just to be pedantic, could the company encrypt the logs with a third-party key in escrow, s.t they would not be able to access that data, but the third party could provide access e.g. for a court.
The problem ultimately isn't a technical one but a political one.
Point 1: Every company has profit incentive to sell the data in the current political climate, all they need is a sneaky way to access it without getting caught. That includes the combo of LLM provider and Escrow non-entity.
Point 2: No company has profit incentive to defend user privacy, or even the privacy of other businesses. So who could run the Escrow service? Another business? Then they have incentive to cheat and help the LLM provider access the data anyway. The government (and which one)? Their intelligence arms want the data just as much as any company does so you're back to square one again.
"Knowledge is power" combined with "Knowledge can be copied without anyone knowing" means that there aren't any currencies presently powerful enough to convince any other entity to keep your secrets for you.
But OpenAi/etc has the logs in the first place, so they can retain them if they wanted anyway. I thought the idea here is b/c they are now required to keep logs its always the case that they will retain them, hence this needs to be made clear i.e. "you will have no privacy"
But, since, I think, there are mechanisms by which they could keep logs, but in a way they cannot access them, they could still claim you will have privacy this way - even though they have the option to keep un-encrypted log, much like they could retain the logs in the first place. So the messaging may remain pretty much the same - from "we promise to delete your logs and keep no other copies, trust us" to "we promise to 3p-encrypt your archived logs and keep no other copies, trust us".
> No company has profit incentive to defend user privacy, or even the privacy of other businesses.
> They have incentive to cheat and help the LLM provider access the data anyway
Why would a company whose role is that of a 3p escrow be incentivised to risk their reputation by doing this? If that's the case every company holding PII has the same problem.
> Their intelligence arms want the data
In the EU at least, GDPR or similar. If you explicit law breaking, that's a more general problem. But what company has a "intelligence arms" in this manner? Are you talking about another big-tech corp?
I'd say this type of cheating it's be a risky proposition from the POV from that 3pe - it'd destroy their business, and they'd be penalised heavily b/c sharing keys is pretty explicitly illegal - any company caught could maybe reduce their own punishment by providing the keys as evidence of the 3pe crime. A viable 3pe business would also need multiple client companies to be viable, so you'd need all of them to play ball - a single whistle-blower in any of them will get you caught, and again, all they need is a single key to prove your guilt.
> "Knowledge is power" combined with "Knowledge can be copied without anyone knowing" means that there aren't any currencies presently powerful enough to convince any other entity to keep your secrets for you.
On that same basis, large banks could cheat the stock market; but there is regulation in place to address that somewhat.
Maybe 3p-escrows should be regulated more, or required to register as a currently-regulated type. That said, if you want to protect data from the government, prism etc, you're SOOL, no one can stop them cheating. let's focus on big-/tech/-startup cheats.
Me> The government (and which one)? Their intelligence arms want the data just as much as any company does[..]
You> But what company has a "intelligence arms" in this manner? Are you talking about another big-tech corp?
"Their" in this circumstance refers to any government that might try to back Escrow.
Sorry, b/c the question mark is outside the parens I read that as the end of the sentence.
Then I refer to my comment on prism: "if you want to protect data from the government, prism etc, you're SOOL, no one can stop them cheating. let's focus on big-/tech/-startup cheats."
Though you talk about "backing" escrow, I mean regulating. The government otherwise controls all business and society. How is it any different to the banks, sec companies etc in that respect.
> And wrapper-around-ChatGPT startups should double-check their privacy policies, that all the "you have no privacy" language is in place.
If a court orders you to preserve user data, could you be held liable for preserving user data? Regardless of your privacy policy.
I don't think the suit would be against you preserving it, it would be against you falsely representing that you aren't preserving it.
A court ordering you to stop selling pigeons doesn't mean you can keep your store for pigeons open and pocket the money without delivering pigeons.
Almost all privacy policies are going to have a call out for legal rulings. For example, here is the Hackernews Legal section in the privacy policy (https://www.ycombinator.com/legal/)
> Legal Requirements: If required to do so by law or in the good faith belief that such action is necessary to (i) comply with a legal obligation, including to meet national security or law enforcement requirements, (ii) protect and defend our rights or property, (iii) prevent fraud, (iv) act in urgent circumstances to protect the personal safety of users of the Services, or the public, or (v) protect against legal liability.
most people aren't sharing internal company data with hacker news or reddit
Sure, but my point is that most services will have something like this, no matter what data they have.
Not a lawyer, but I don't believe there is anything that any person or company can write on a piece of paper that supersedes the law.
The point is not about superseding the law. The point is that if your company privacy policy says "we will not divulge this data to 3rd parties under any circumstance", and later they are served with a warrant to divulge that data to the government, two things are true:
- They are legally obligated to divulge that data to the government
- Once they do so, they are civilly liable for breach of contract, as they have committed to never divulging this data. This may trigger additional breaches of contract, as others may have not had the right to share data with a company that can share it with third parties
Yes. If your agreement with the end user says that you won't collect and store data, you're responsible for it. If you can't provide it (even if due to a court order), you have to adjust your contract.
Your users aren't obligated to know that you're using open ai or other provider.
> If a court orders you to preserve user data, could you be held liable for preserving user data?
No, because you turn up to court and show the court order.
It's possible a subsequent case could get the first order overturned, but you can't be held liable for good faith efforts to comply with court orders.
However, if you're operating internationally, then suddenly it's possible that you may be issued competing court orders both of which are "valid". This is the CLOUD Act problem. In which case the only winning move becomes not to play.
I'm pretty sure even in the USA, you could still be held liable for breach of contract, if you made representations to your customers that you wouldn't share data under any circumstance. The fact that you made a promise you obviously couldn't keep doesn't absolve you from liability for that promise.
Can you find an example of that happening? For any "we promised not to do X but were ordered by a court to do it" event.
No. It’s a legal court order.
This, however, is horrible for AI regardless of whether or not you can sue.
In the US you absolutely can challenge everything up and including the constitutionality of court orders. You may be swiftly dismissed if nobody thinks you have a valid case, but you can try.