Item 44186999

JKCalhoun • 2 days ago

> She suggested that OpenAI could have taken steps to anonymize the chat logs but chose not to

That is probably the solution right there.

blagie • 2 days ago

This data cannot be anonymized. This is trivial provable, both mathematically, but given the type of data, it should also be intuitively obvious to even the most casual observer.

If you're talking to ChatGPT about being hunted by a Mexican cartel, and having escaped to your Uncle's vacation home in Maine -- which is the sort of thing a tiny (but non-zero) minority of people ask LLMs about -- that's 100% identifying.

And if the Mexican cartel finds out, e.g. because NY Times had a digital compromise at their law firm, that means someone is dead.

Legally, I think NY Times is 100% right in this lawsuit holistically, but this is a move which may -- quite literally -- kill people.

2 replies

zarzavat • 2 days ago

It's like anonymizing your diary by erasing your name on the cover.

JKCalhoun • 2 days ago

I don't dispute your example, but I suspect there is a non-zero number of cases that would not be so extreme, so obviously identifiable.

So, sure, no panacea, but .. why not for the cases where it would be a barrier?

1 reply

blagie • 22 hours ago

Because such cases don't really exist.

Your text used an unusual double ellipsis (" .. " instead of "... "), uncommon (not rare) generative vocabulary ("panacea"), etc. Statistics on those allows for pretty good re-identification.

Ditto for times you do things and work schedule.

Etc.

It's not "obviously identifiable," but a buffer overflow is not "obviously exploitable." Rather, it takes a very, very expert individual to write a script before everyone can exploit it.

Ditto here.

genewitch • 2 days ago

AOL found out and thus we all found out that you can't anonymize certain things, web searches in that case. I used to have bookmarked some literature from maybe ten years ago that said,(proved with math?), any moderate collection of data from or by individuals that fits certain criteria is de-anonymizeable, if not by itself, then with minimal extra data. I want to say it included if, for instance, instead of changing all occurances of genewitch to user9843711, every instance of genewitch was a different, unique id.

I apologize for not having cites or a better memory at this time.

1 reply

catlifeonmars • 1 day ago

https://en.wikipedia.org/wiki/K-anonymity

1 reply

genewitch • 1 day ago

> The root of this problem is the core problem with k-anonymity: there is no way to mathematically, unambiguously determine whether an attribute is an identifier, a quasi-identifier, or a non-identifying sensitive value. In fact, all values are potentially identifying, depending on their prevalence in the population and on auxiliary data that the attacker may have. Other privacy mechanisms such as differential privacy do not share this problem.

see also: https://en.wikipedia.org/wiki/Differential_privacy which alleges to solve this; that is, wiki says that the only attacks are side-channel attacks like errors in the algorithm or whatever.

1 reply

catlifeonmars • 1 day ago

If you squint a little, this problem is closely related to oblivious transfer as well

paxys • 2 days ago

> She suggested that OpenAI could have taken steps to anonymize the chat logs but chose not to, only making an argument for why it "would not" be able to segregate data, rather than explaining why it "can’t."

Sounds like bullshit lawyer speak. What exactly is the difference between the two?

1 reply

dijksterhuis • 2 days ago

Not wanting to do something isn't the same thing as being unable to do something.

!define would

> Used to express desire or intent -- https://www.wordnik.com/words/would

!define cannot

> Can not ( = am/is/are unable to) -- https://www.wordnik.com/words/cannot

1 reply

paxys • 2 days ago

Who said anything about not wanting to?

"I will not be able to do this"

"I cannot do this"

There is no semantic or legal difference between the two, especially when coming from a tech company. Stalling and wordplay is a very common legal tactic when the side has no other argument.

2 replies

dijksterhuis • 2 days ago

The article is derived from the order, which is itself a short summary of conversations had in court.

https://cdn.arstechnica.net/wp-content/uploads/2025/06/NYT-v...

> I asked:

> > Is there a way to segregate the data for the users that have expressly asked for their chat logs to be deleted, or is there a way to anonymize in such a way that their privacy concerns are addressed... what’s the legal issue here about why you can’t, as opposed to why you would not?

> OpenAI expressed a reluctance for a "carte blanche, preserve everything request," and raised not only user preferences and requests, but also "numerous privacy laws and regulations throughout the country and the world that also contemplate these type of deletion requests or that users have these types of abilities."

A "reluctance to retain data" is not the same as "technically or physically unable to retain data". Judge decided OpenAI not wanting to do it was less important than evidence being deleted.

lanyard-textile • 2 days ago

Disagree. There’s something about the “able” that implies a hindered routine ability to do something — you can otherwise do this, but something renders you unable.

“I won’t be able to make the 5:00 dinner.” -> You could normally come, but there’s another obligation. There’s an implication that if the circumstances were different, you might be able to come.

“I cannot make the 5:00 dinner.” -> You could not normally come. There’s a rigid reason for the circumstance, and there is no negotiating it.

1 reply

jjk166 • 2 days ago

If someone was in an accident that rendered them unable to walk, would you say they can or can not walk?

1 reply

lanyard-textile • 1 day ago

Yes? :) Being unable to walk is typically non negotiable.