sinuhe69 2 days ago

Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Billion people use the internet daily. If any organization suspects some people use the Internet for illicit purposes eventually against their interests, would the court order the ISP to log all activities of all people? Would Google be ordered to save the search of all its customers because some might use it for bad things? And once we start, where will we stop? Crimes could happen in the past or in the future, will the court order the ISP and Google to retain the logs for 10 years, 20 years? Why not 100 years? Who should bear the cost for such outrageous demands?

The consequences of such orders are of enormous impact the puny judge can not even begin to comprehend. Privacy right is an integral part of the freedom of speech, a core human right. If you don’t have private thoughts, private information, anybody can be incriminated against them using these past information. We will cease to exist as individuals and I argue we will cease to exist as human as well.

18
capnrefsmmat 2 days ago

Courts have always had the power to compel parties to a current case to preserve evidence. (For example, this was an issue in the Google monopoly case, since Google employees were using chats set to erase after 24 hours.) That becomes an issue in the discovery phase, well after the defendant has an opportunity to file a motion to dismiss. So a case with no specific allegation of wrongdoing would already be dismissed.

The power does not extend to any of your hypotheticals, which are not about active cases. Courts do not accept cases on the grounds that some bad thing might happen in the future; the plaintiff must show some concrete harm has already occurred. The only thing different here is how much potential evidence OpenAI has been asked to retain.

dragonwriter 1 day ago

> Courts have always had the power to compel parties to a current case to preserve evidence.

Not just that, even without a specific court order parties to existing or reasonably anticipated litigation have a legal obligation that attaches immediately to preserve evidence. Courts tend to issue orders when a party presents reason to believe another party is out of compliance with that automatic obligation, or when there is a dispute over the extent of the obligation. (In this case, both factors seem to be in play.)

btown 1 day ago

Lopez v. Apple (2024) seems to be a recent and useful example of this; my lay understanding is that Apple was found to have failed in its duty to switch from auto-deletion (even if that auto-deletion was contractually promised to users) to an evidence-preservation level of retention, immediately when litigation was filed.

https://codiscovr.com/news/fumiko-lopez-et-al-v-apple-inc/

https://app.ediscoveryassistant.com/case_law/58071-lopez-v-a...

Perhaps the larger lesson here is: if you don't want your service provider to end up being required to retain your private queries, there's really no way to guarantee it, and the only real mitigation is to choose a service provider who's less likely to be sued!

(Not a lawyer, this is not legal advice.)

golol 1 day ago

So if Amazon sues Google, claiming that it is being disadvantaged in search rankings, a court should be able to force Google to log all search activity, even when users delete it?

cogman10 1 day ago

Yes. That's how the US court system works.

Google can (and would) file to keep that data private and only the relevant parts would be publicly available.

A core aspect to civil lawsuits is everyone gets to see everyone else's data. It's that way to ensure everything is on the up and up.

lxgr 1 day ago

A great model – in a world without the Internet and LLMs (or honestly just full text search).

SR2Z 17 hours ago

Maybe you misunderstood. The data is required to be retained, but there is no requirement to make it accessible to the opposition. OpenAI already has this data and presumably mines it themselves.

Courts generally require far more data to be retained than shared, even if this ask is much more lopsided.

dragonwriter 1 day ago

If Amazon sues Google, a legal obligation to preserve all evidence reasonably related to the subject of the suit attaches immediately when Google becomes aware of the suit, and, yes, if there is a dispute about the extent of that obligation and/or Google's actual or planned compliance with it, the court can issue an order relating to it.

monetus 1 day ago

At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.

nobody9999 1 day ago

>At Google's scale, what would be the hosting costs of this I wonder. Very expensive after a certain point, I would guess.

Which would be chump change[0] compared to the costs of an actual trial with multiple lawyers/law firms, expert witnesses and the infrastructure to support the legal team before, during and after trial.

[0] https://grammarist.com/idiom/chump-change/

saddist0 1 day ago

It can be just anonymised search history in this case.

dragonwriter 1 day ago

> It can be just anonymised search history in this case.

Depending on the exact issues in the case, a court might allow that (more likely, it would allow only turning over anonymized data in discovery, if the issues were such that that there was no clear need for more) but generally the obligation to preserve evidence does not include the right to edit evidence or replace it with reduced-information substitutes.

Macha 1 day ago

We found that one was a bad idea in the earliest days of the web when AOL thought "what could the harm be?" about turning over anonymised search queries to researchers.

dogleash 1 day ago

How did you go from a court order to persevere evidence and jump to dumping that data raw into the public record?

Courts have been dealing with discovery including secrets that litigants never want to go public for longer than AOL has existed.

mattnewton 1 day ago

That sounds impossible to do well enough without being accused of tampering with evidence.

Just erasing the userid isn’t enough to actually anonymize the data, and if you scrubbed location data and entities out of the logs you might have violated the court order.

Though it might be in our best interests as a society we should probably be honest about the risks of this tradeoff; anonymization isn’t some magic wand.

lcnPylGDnU4H9OF 2 days ago

So then the courts need to find who is setting their chats do be deleted and order them to stop. Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs. OpenAI is doing the responsible thing here.

capnrefsmmat 2 days ago

OpenAI is the custodian of the user data, so they are responsible. If you wanted the court (i.e., the plaintiffs) to find specific infringing chatters, first they'd have to get the data from OpenAI to find who it is -- which is exactly what they're trying to do, and why OpenAI is being told to preserve the data so they can review it.

happyopossum 1 day ago

So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.

dragonwriter 1 day ago

No, they should not.

However, if the ISP, for instance, is sued, then it (immediately and without a separate court order) becomes illegal for them to knowingly destroy evidence in their custody relevant to the issue for which they are being sued, and if there is a dispute about their handling of particular such evidence, a court can and will order them specifically to preserve relevant evidence as necessary. And, with or without a court order, their destruction of relevant evidence once they know of the suit can be the basis of both punitive sanctions and adverse findings in the case to which the evidence would have been relevant.

lelanthran 1 day ago

> So the courts should start ordering all ISPs, browsers, and OSs to log all browsing and chat activity going forward, so they can find out which people are doing bad things on the internet.

Not "all", just the ones involved in a current suit. They already routinely do this anway (Party A is involved in a suit and is ordered to retain any and all evidence for the duration of the trial, starting from the first knowledge that Party A had of the trial).

You are mischaracterising what happens; you are presenting it as "Any court, at any time can order any party who is not involved in any suit in that sourt to forever hold user data"

That is not what is happening.

lovich 1 day ago

If those entities were custodians in charge of the data at hand in the court case, the court would order that.

This post appears to be full of people who aren’t actually angry at the results of this case but angry at how the US legal system has been working for decades, possibly centuries since I don’t know when this precedent was first set

scarab92 1 day ago

Is it not valid to be concerned about overly broad invasions of privacy regardless of how long such orders have been occurring?

Retric 1 day ago

What privacy specifically? The courts have always been able to compel people to recount things they know which could include a conversation between you and your plumber if it was somehow related to a case.

The company records and uses this stuff internally, retention is about keeping information accurate and accessible.

Lawsuits allow in a limited context the sharing of non public information held by individuals/companies in the lawsuit. But once you submit something to OpenAI it’s now there information not just your information.

nickff 1 day ago

I think that some of the people here dislike (or are alarmed by) the way that the court can compel parties to retain data which would otherwise have vanished into the ether.

lelanthran 1 day ago

> I think that some of the people here dislike (or are alarmed by) the way that the court can compel parties to retain data which would otherwise have vanished into the ether.

Maybe so, but this has always been the case for hundreds of years.

After all, how on earth do you propose having getting fair hearing if the other party is allowed to destroy the evidence you asked for in your papers?

Because this is what would happen:

You: Your Honour, please ask the other party to turn over all their invoices for the period in question

Other Party: We will turn over only those invoices we have

*Other party goes back to the office and deletes everything.

The thing is, once a party in a suit asks for a certain piece of evidence, the other party can't turn around and say "Our policy is to delete everything, and our policy trumps the orders of this court".

nickff 17 hours ago

I think your points are all valid, but… On the other hand, this sort of preservation does substantially reduce user privacy, disclosing personal information to unauthorized parties, with no guarantees of security, no audits, and few safeguards.

This is much more concerning (from a privacy perspective) than a company using cookies to track which pages on a website they’ve visited.

lelanthran 8 minutes ago

> On the other hand, this sort of preservation does substantially reduce user privacy,

Yes, that's by design and already hundreds of years old in practice.

You cannot refuse a court evidence to protect your or anyone else's privacy.

I see no reason to make an exception for rich and powerful companies.

I don't want a party to a suit having the ability to suppress evidence due to privacy concerns. There is no privacy once you get to a civil court other than what the court, at its discretion, allows, such as anonymisation.

Retric 17 hours ago

I disagree because the information has already been recorded and users don’t have a say in who at the company or some random 3rd party the company sells that data to is “authorized” to view data.

It’s the collection itself that’s the problem not how soon it’s deleted as economically worthless.

> with no guarantees of security, no audits, and few safeguards.

The courts pay far more attention to that stuff than profit maximizing entities like OpenAI.

nickff 17 hours ago

I agree that your assessment of the legal state-of-play is likely accurate. That said it is one thing for data to be cached in the short-term, and entirely different for it to be permanently stored and then sent out to parties which the user has only a distant and likely adversarial relationship with.

There are many situations in which the deletion/destruction of ‘worthless’ data is treated as a security protection. The one that comes to mind is how some countries destroy fingerprint data after it has been used for the creation of a biometric passport. Do you really think this is a futile act?

>”The courts pay far more attention to that stuff than profit maximizing entities like OpenAI.”

I would be interested to see evidence of this. The courts claim to value data security, but I have never seen an audit of discovery-related data storage, and I suspect there are substantial vulnerabilities in the legal system, including the law firms. Can a user hold the court or opposing law firm financially accountable if they fail to safeguard this data? I’ve never seen this happen.

Retric 16 hours ago

> That said it is one thing for data to be cached in the short-term

Cashed data isn’t necessarily available for data retention to apply in the first place. Just because an ISP has parts of a message in some buffer doesn’t mean it’s considered as a recording of that data. If Google never stores queries beyond what’s needed to serve a response then it likely wouldn’t qualify.

Also, it’s on the entity providing data for the discovery process to do redaction as appropriate. The only way it ends up at the other end is if it gets sent in the first pace. There can be a lot of back and forth here and as to evidence that the courts care: https://www.law.cornell.edu/rules/frcp/rule_5.2

nickff 16 hours ago

That is helpful, thanks, but I think it is not practical to redact LLM request information beyond the GDPR personally identifiable standards without just deleting everything. My (admittedly quick) read of those rules is that their ‘redacted’ information would still be readily identifiable anyway (not directly, but using basic data analysis). Their redaction standards for CC# and SIN are downright pathetic, and allow for easy recovery with modern techniques.

dragonwriter 1 day ago

Its not an “invasion of privacy” for a company who already had data to be prohibited from destroying it when they are sued in a case where that data is evidence.

dogleash 1 day ago

Yeah, sure. But understanding the legal system tells us the players and what systems exist that we might be mad at.

For me, one company obligated to retain business records during civil litigation against another company, reviewed within the normal discovery process is tolerable. Considering the alternative is lawlessness. I'm fine with it.

Companies that make business records out of invading privacy? They, IMO, deserve the fury of 1000 suns.

lovich 1 day ago

It’s not private. You handed over the data to a third party.

rodgerd 1 day ago

If you cared about your privacy, why are you handing all this stuff to Sam Altman? Did he represent that OpenAI would be privacy-preserving? Have they taken any technical steps to avoid this scenario?

Vilian 1 day ago

Or you didn't read what was written by the other comment, or are just arguing in bad faith, what's even weierder because the guy was only explaining how the the system always worked

dragonwriter 1 day ago

> So then the courts need to find who is setting their chats do be deleted and order them to stop.

No, actually, it doesn't. Ordering a party to stop destroying evidence relevant to a current case (which is its obligation even without a court order) irrespective of whether someone else asks it to destroy that evidence is both within the well-established power of the court, and routine.

> Or find specific infringing chatters and order OpenAI to preserve these specified users’ logs.

OpenAI is the alleged infringer in the case.

IAmBroom 1 day ago

Under this theory, if a company had employees shredding incriminating documents at night, the court would have to name those employees before ordering them to stop.

That is ridiculous. The company itself receives that order, and is IMMEDIATELY legally required to comply - from the CEO to the newest-hired member of the cleaning staff.

MeIam 2 days ago

Time does not need user logs to prove such a thing if it was true. Times can show that it is possible so they can show how their own users can access the text. Why would they need other user's data?

KaiserPro 2 days ago

> Time does not need user logs to prove such a thing if it was true.

No it needs to show how often it happens to prove a point of how much impact its had.

MeIam 1 day ago

Why would that matter, if people didn't use it as much, does it mean that it doesn't matter if there were few people?

dragonwriter 1 day ago

> Why would that matter

Because its a copyright infringement case, so existence and the scale of the infringement is relevant to both whether there is liability and, if so, how much; the issue isn't that it is possible for infringement to occur.

delusional 1 day ago

You have to argue damages. It actually has to have cost NYT some money, and for that you need to know some extent.

MeIam 1 day ago

We don't even know if Times uses AI to get information from other sources either. They can get a hint of news and then produce their material.

KaiserPro 1 day ago

> We don't even know if Times uses AI to get information from other sources either

which is irrelevant at this stage. Its a legal principle that both sides can fairly discover evidence. As finding out how much openAI has infringed copyright is pretty critical to the case, they need to find out.

After all, if its only once or twice, thats a couple of dollars, if its millions of times, that hundreds of millions

cogman10 1 day ago

OpenAI is also entitled to discovery. They can literally get every email and chat the times has and require from this point on they preserve such logs

delusional 1 day ago

Who cares? That's not a legal argument and it doesn't mean anything to this case.

lovich 1 day ago

Oh, I was unaware that Times was inventing a novel technology with novel legal questions.

It’s very impressive they managed to do such innovation in their spare time while running a newspaper and site

mandevil 1 day ago

For the most part (there are a few exceptions), in the US lawsuits are not based on "possible" harm but actual observed harm. To show that, you need actual observed user behavior.

dragonwriter 1 day ago

> Times can show that it is possible

The allegation is not that merely that infringement is possible; the actual occurrence and scale are relevant to the case.

mrtksn 2 days ago

>Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Probably because they bothered to pursue such a thing and hundreds of millions people did not.

How do you conclusively know if someone's content generating machine infringe with your rights? By saving all of its input/output for investigation.

It's ridiculous, sure but is it less ridiculous than AI companies claiming that the copyrights shouldn't apply to them because it will be bad for their business?

IMHO those are just growth pain. Back in the day people used to believe that the law don't apply on them because they did it on the internet and they were mostly right because the laws were made for another age. Eventually the laws both for criminal stuff and copyright caught up. Will be the same for AI, now we are in the wild west age of AI.

TimPC 1 day ago

AI companies aren't seriously arguing that copyright shouldn't apply to them because "it's bad for business". The main argument is that they qualify for fair use because their work is transformative which is one of the major criteria for fair use. Fair use is the same doctrine that allows a school to play a movie for educational purposes without acquiring a license for the public performance of that movie. The original works don't have model weights and can't answer questions or interact with a user so the output is substantially different from the input.

c256 1 day ago

> Fair use is the same doctrine that allows a school to play a movie for educational purposes without acquiring a license for the public performance of that movie. This is a pretty bad example, since fair use has been ruled to NOT allow this.

mandevil 1 day ago

It is a bad example, but not for that reason. Instead, it's a bad example because Federal copyright law has a specific carve out for school educational purposes:

https://www.copyright.gov/title17/92chap1.html#110 "Notwithstanding the provisions of section 106, the following are not infringements of copyright:

(1) performance or display of a work by instructors or pupils in the course of face-to-face teaching activities of a nonprofit educational institution, in a classroom or similar place devoted to instruction, unless, in the case of a motion picture or other audiovisual work, the performance, or the display of individual images, is given by means of a copy that was not lawfully made under this title, and that the person responsible for the performance knew or had reason to believe was not lawfully made;"

That is why it is not a good comparison with the broader Fair Use Four Factors test (defined in section 107: https://www.copyright.gov/title17/92chap1.html#107) because it doesn't need to even get to that analysis, it is exempted from copyright.

arcfour 1 day ago

What Scrooge sued a school for exhibiting a film for educational purposes?!

kitified 1 day ago

Whether a school was actually sued over this is not relevant to whether it is legally allowed.

no_wizard 1 day ago

If AI companies don’t want the court headaches they should instead preemptively negotiate with rights holders and get agreements in place for the sharing of data.

arcfour 1 day ago

Feels like bad faith to say that knowing full well that

1. This would also be a massive legal headache,

2. It would become impossibly expensive

3. We obviously wouldn't have the AI we have today, which is an incredible technology (if immature) if this happened. Instead the growth of AI would have been strangled by rights holders wanting infinity money because they know once their data is in that model, they aren't getting it back, ever—it's a one-time sale.

I'm of the opinion that AI is and will continue to be a net positive for society. So I see this as essentially saying "let's go an remove this and delay the development of it by 10-20 years and ensure people can't train and run their own models feasibly for a lot longer because only big companies can afford real training datasets."

allturtles 1 day ago

Why not simply make your counterargument rather than accusing GP of being in bad faith? Your argument seems to be that it's fine to break the law if the net outcome for society is positive. It's not "bad faith" to disagree with that.

arcfour 1 day ago

But they didn't break the law. The NYT articles were not algorithms/AI.

It's bad faith because they are saying "well, they should have done [unreasonable thing]". I explored their version of things from my perspective (it's not possible) and from a conciliatory perspective (okay, let's say they somehow try to navigate that hurdle anyways, is society better off? Why do I think it's infeasible?)

allturtles 1 day ago

If they didn't break the law, your pragmatic point about outcomes is irrelevant. Open AI is in the clear regardless of whether they are doing something great or something useless. So I don't honestly know what you're trying to say. I'm not sure why getting licenses to IP you want to use is unreasonable, it happens all the time.

Edit: Authors Guild, Inc. v. Google, Inc. is a great example of a case where a tech giant tried to legally get the rights to use a whole bunch of copyrighted content (~all books ever published), but failed. The net result was they had to completely shut off access to most of the Google Books corpus, even though it would have been (IMO) a net benefit to society if they had been able to do what they wanted.

bostik 1 day ago

> Your argument seems to be that it's fine to break the law if the net outcome for society is positive.

In any other context, this would be known as "civil disobediance". It's generally considered something to applaud.

For what it's worth, I haven't made up my mind about the current state of AI. I haven't yet seen an ability for the systems to perform abstract reasoning, to _actually_ learn. (Show me an AI that has been fed with nothing but examples in languages A and B. Then demonstrate, conclusively, that it can apply the lessons it has learned in language M, which happens to be nothing like the first two.)

allturtles 1 day ago

> In any other context, this would be known as "civil disobediance". It's generally considered something to applaud.

No, civil disobedience is when you break the law expecting to be punished, to force society to confront the evil of the law. The point is that you get publicly arrested, possibly get beaten, get thrown in jail. This is not at all like what Open AI is doing.

nobody9999 1 day ago

>I'm of the opinion that AI is and will continue to be a net positive for society. So I see this as essentially saying "let's go an remove this and delay the development of it by 10-20 years and ensure people can't train and run their own models feasibly for a lot longer because only big companies can afford real training datasets."

Absolutely. Which, presumably, means that you're fine with the argument that your DNA (and that of each member of your family) could provide huge benefits to medicine and potentially save millions of lives.

But significant research will be required to make that happen. As such, we will be requiring (with no opt outs allowed) you and your whole family to provide blood, sperm and ova samples weekly until that research pays off. You will receive no compensation or other considerations other than the knowledge that you're moving the technology forward.

May we assume you're fine with that?

mrtksn 1 day ago

Yeah, and the online radio providers argued that they don’t do anything shady, their service was basically just a very long antenna.

Anyway, the laws were not written with this type of processing in mind. In fact the whole idea of intellectual property breaks down now. Just like the early days of the internet.

mandevil 1 day ago

https://www.copyright.gov/title17/92chap1.html#110 seems to this non-lawyer to be a specific carve out allowing movies to be shown, face-to-face, in non-profit educational contexts without any sort of license. The Fair Use Four Factors test (https://www.copyright.gov/title17/92chap1.html#107) isn't even necessary in this example.

Absent a special legal carve-out, you need to get judges to do the Fair Use Four Factors test, and decide on how AI should be treated. To my very much engineer and not legal eye, AI does great on point 3, but loses on points 1, 2, and 4, so it is something that will need to be decided by the judges, how to balance those four factors defined in the law.

rodgerd 1 day ago

> AI companies aren't seriously arguing that copyright shouldn't apply to them because "it's bad for business".

AI companies have, in fact, said that the law shouldn't apply to them or they won't make money. That is literally the argument Nick Clegg is using to ague that copyright protection should be removed from authors and musicians in the UK.

freejazz 1 day ago

That's not entirely true. A lot of their briefing refers to how impractical and expensive it would be to license all the content they need for the models.

AStonesThrow 1 day ago

> allows a school to play a movie

No, it doesn’t. Play 10% of a movie for the purpose of critiquing it, perhaps.

https://fairuse.stanford.edu/overview/fair-use/four-factors/

Fair Use is not an a priori exemption or exception; Fair Use is an “affirmative defense” so once you have your day in court and the judge asks your attorney why you needed to play 10% of Priscilla, Queen of the Desert for your Gender Studies class, then you can run down those Four Factors enumerated by the Stanford article.

Particularly “amount and substantiality”.

Teachers and churches get tripped up by this all the time. But I’ve also been blessed with teachers who were very careful academically and sought to impart the same caution on all students about using copyrighted materials. It is not easy when fonts have entered the chat!

The same reason you or your professor cannot show/perform 100% of an unlicensed film under any circumstance, is the same basis that creators are telling the scrapers that they cannot consume 100% of copyrighted works on that end. And if the risks may involve reproducing 87% of the same work in their outputs, that’s beyond the standard thresholds.

shkkmo 1 day ago

> It's ridiculous, sure but is it less ridiculous than AI companies claiming that the copyrights shouldn't apply to them because it will be bad for their business?

Since that wasn't ever a real argument, your strawman is indeed ridiculous.

The argument is that requiring people to have a special license to process text with an algorithm is a dramatic expansion of the power of copyright law. Expansions of copyright law will inherently advantage large corporate users over individuals as we see already happening here.

New York Times thinks that they have the right to spy on the entire world to see if anyone might be trying to read articles for free.

That is the problem with copyright. That is why copyright power needs to be dramatically curtailed, not dramatically expanded.

dogman144 1 day ago

You raise good points but the target of your support feels misplaced. Want private ai? You must self-host and inspect if it’s phoning home. No way around it in my view.

Otherwise, you are picking your data privacy champions as the exact same companies, people and investors that sold us social media, and did something quite untoward with the data they got. Fool me twice, fool me three times… where is the line?

In other words - OAI has to save logs now? Candidly they probably were already, or it’s foolish not to assume that.

jrm4 1 day ago

Love the spirit of what you say and I practice it myself, literally.

But also, no - Just self-host or it's all your fault is never ever a sufficient answer to the problem.

It's exactly the same as when Exxon says "what are you doing to lower your own carbon footprint?" It's shifting the burden unfairly; companies like OpenAI put themselves out there and thus must ALWAYS be held to task.

dogman144 1 day ago

I actually agree with your disagreement, and my answer is more scoped to a technical audience that has the know how base to deal with it.

I wish it was different and I agree that there’s a massive accountability hole with… who could it be?

Pragmatically it is what it is, self host and hope for bigger picture change.

naming_the_user 1 day ago

Anything else is literally impossible, though.

If you send your neighbour nudes then they have your nudes. You can put in as many contracts as you want, maybe they never digitised it but their friend is over for a drink and walks out of the door with the shoebox of film. Do not pass GO, do not collect.

Conceivably we can try to control things like e.g. is your cellphone microphone on at all times, but once someone else, particularly an arbitrary entity (e.g. not a trusted family member or something) has the data, it is silly to treat it as anything other than gone.

lovich 1 day ago

Then your problem is with the US legal system, not this individual ruling.

You lose your rights to privacy in your papers without a warrant once you hand data off to a third party. Nothing in this ruling is new.

fluidcruft 1 day ago

A pretty clear distinction is that all ISPs in the world are not currently involved in a lawsuit with New York Times and are not accused of deleting evidence. What OpenAI is accused of is significantly different from merely agnostically routing packets between A and B. OpenAI is not raising astronomical funds because they operate as an ISP.

tailspin2019 1 day ago

> Privacy right is an integral part of the freedom of speech

I completely agree with you, but as a ChatGPT user I have to admit my fault in this too.

I have always been annoyed by what I saw as shameless breaches of copyright of thousands of authors (and other individuals) in the training of these LLMs, and I've been wary of the data security/confidentiality of these tools from the start too - and not for no reason. Yet I find ChatGPT et al so utterly compelling and useful, that I poured my personal data[0] into these tools anyway.

I've always felt conflicted about this, but the utility just about outweighed my privacy and copyright concerns. So as angry as I am about this situation, I also have to accept some of the blame too. I knew this (or other leaks or unsanctioned use of my data) was possible down the line.

But it's a wake up call. I've done nothing with these tools which is even slightly nefarious, but I am today deleting all my historical data (not just from ChatGPT[1] but other hosted AI tools) and will completely reassess my approach of using them - likely with an acceleration of my plans to move to using local models as much as I can.

[0] I do heavily redact my data that goes into hosted LLMs, but there's still more private data in there about me than I'd like.

[1] Which I know is very much a "after the horse has bolted" situation...

CamperBob2 1 day ago

Keeping in mind that the purpose of IP law is to promote human progress, it's hard to see how legacy copyright interests should win a fight with AI training and development.

100 years from now, nobody will GAF about the New York Times.

stackskipton 1 day ago

IP law was to promote human progress by giving financial incentive to create this IP knowing it was protected, and you could make money off it.

CamperBob2 1 day ago

We will all make a lot more money and a lot more progress by storing, organizing, presenting, and processing knowledge as effectively as possible.

Copyright is not a natural right by any measure; it's something we pulled out of our asses a couple hundred years ago in response to a need that existed at the time. To the extent copyright interferes with progress, as it appears to have sworn to do, it has to go.

Sorry. Don't shoot the messenger.

diputsmonro 1 day ago

Why would you expect NYT or any other news organization to report accurate data to feed into your AI models if they can't make any money off of it?

It's not just about profits, it's about paying reporters to do honest work and not cut corners in their reporting and data collection.

If you think the data is valuable, then you should be prepared to pay the people who collect it, same as you pay for the service that collates it (ChatGPT)

CamperBob2 1 day ago

I wish I knew what the eventual business model will look like, but I don't. A potential guess might be to consider what MSNBC was, or was supposed to be -- a joint venture between Microsoft and NBC network news, where the idea was to take advantage of the emerging WWW to get a head start on everyone else. The pie-in-the-sky synergies that were promised never materialized, so the outcome just amounted to a new name for an old-media player. As it turned out, the business of gathering and delivering news and editorial content didn't change much at all. It just migrated from paper and screens to, well, screens.

Now, as you point out, companies like OpenAI have a problem, and so do the rest of us. Fair compensation for journalists and editors requires attribution before anything else can even be negotiated, and AI literally transforms its input into something that is usually (but obviously not always) untraceable. For the big AI players, the solution to that problem might involve starting or acquiring news and content networks of their own. Synergies that Microsoft and NBC were hoping might materialize could actually be feasible now.

So to answer your question, maybe ChatGPT will end up paying journalists directly.

Again, I don't know how plausible that kind of scenario might turn out to be. But I am absolutely certain that countries that allow their legacy rightsholders to impede progress in AI are going to be outcompeted by those with less to lose.

tailspin2019 1 day ago

Copyright is the thing that allows software companies to sell their products and make money. It’s not just about “knowledge”.

I sometimes wonder if people commenting on this topic on HN really understand how fundamental copyright as a concept is to the entire tech industry. And indeed even to capitalism itself.

freejazz 17 hours ago

>We will all make a lot more money and a lot more progress by storing, organizing, presenting, and processing knowledge as effectively as possible.

That's a huge assumption in the first place. An even bigger leap to tie that general proposition to what's happening here.

stale2002 1 day ago

But the main point is the human progress here. If there is an obvious case where it seriously gets in the way of human progress, then thats a problem and I hope we can correct it through any means necessary.

DannyBee 1 day ago

Lawyer here

First - in the US, privacy is not a constitutional right. It should be, but it's not. You are protected against government searches, but that's about it. You can claim it's a core human right or whatever, but that doesn't make it true, and it's a fairly reductionist argument anyway. It has, fwiw, also historically not been seen as a core right for thousands of years. So i think it's a harder argument to make than you think despite the EU coming around on this. Again, I firmly believe it should be a core right, but asserting that it is doesn't make that true.

Second, if you want the realistic answer - this judge is probably overworked and trying to clear a bunch of simple motions off their docket. I think you probably don't realize how many motions they probably deal with on a daily basis. Imagine trying to get through 145 code reviews a day or something like that. In this case, this isn't the trial, it's discovery. Not even discovery quite yet, if i read the docket right. Preservation orders of this kind are incredibly common in discovery, and it's not exactly high stakes most of the time. Most of the discovery motions are just parties being a pain in the ass to each other deliberately. This normally isn't even a thing that is heard in front of a judge directly, the judge is usually deciding on the filed papers.

So i'm sure the judge looked at it for a few minutes, thought it made sense at the time, and approved it. I doubt they spent hours thinking hard about the consequences.

OpenAI has asked to be heard in person on the motion, i'm sure the judge will grant it, listen to what they have to say, and determine they probably fucked it up, and fix it. That is what most judges do in this situation.

zerocrates 1 day ago

Even in the "protected against government searches" sense from the 4th Amendment, that right hardly exists when dealing with data you send to a company like OpenAI thanks to the third-party doctrine.

pama 1 day ago

Thanks. As an EU citizen am I exempt from this order? How does the judge or the NYTimes or OpenAI know that I am an EU citizen?

ElevenLathe 1 day ago

The court in question has no obligations to you at all.

jjani 1 day ago

OpenAI does, by virtue of doing business in the EU.

tptacek 16 hours ago

They will not be able to avoid the preservation order for EU customers unless the judge modifies the order.

mananaysiempre 1 day ago

The current legal stance in the US seems to be that you, not being a US person, have no particular legally protected interest in privacy at all, so you have nothing to complain about here and can’t even sue. The only avenue the EU would have to change that is the diplomatic one, but the Commission does not seem to care.

adgjlsfhk1 1 day ago

you aren't and they don't.

tiahura 1 day ago

While the Constitution does not explicitly enumerate a "right to privacy," the Supreme Court has consistently recognized substantive privacy rights through Due Process Clause jurisprudence, establishing constitutional protection for intimate personal decisions in Griswold v. Connecticut (1965), Lawrence v. Texas (2003), and Obergefell v. Hodges (2015).

ComposedPattern 1 day ago

> It has, fwiw, also historically not been seen as a core right for thousands of years.

Nothing has been seen as a core right for thousands of years, as the concept of human rights is only a few hundred years old.

HardCodedBias 1 day ago

"First - in the US, privacy is not a constitutional right"

What? The supreme court disagreed with you in Griswold v. Connecticut (1965) and Roe v. Wade (1973).

While one could argue that they were vastly stretching the meaning of words in these decisions the point stands that at this time privacy is a constitutional right in the USA.

DannyBee 1 day ago

Roe v. wade is considered explicitly overruled, as well as considered wrongly decided in the first place, as of 2022 (Dobbs).

They also explicitly stated a constitutional right to privacy does not exist, and pointed out that Casey abandoned any such reliance on this sort of claim.

Griswold also found a right to marital privacy. Not general privacy.

Griswold is also barely considered good law anymore, though i admit it has not been explicitly overruled - it is definitely on the chopping block, as more than just Thomas has said.

In any case, more importantly, none of them have found any interesting right to privacy of the kind we are talking about here, but instead more specific rights to privacy in certain contexts. Griswold found a right to marital privacy in "the penumbra of the bill of rights". Lawrence found a right to privacy in your sexual activity.

In dobbs, they explicitly further denied a right to general privacy, and argued previous decisions conflated these: " As to precedent, citing a broad array of cases, the Court found support for a constitutional “right of personal privacy.” Id., at 152. But Roe conflated the right to shield information from disclosure and the right to make and implement important personal decisions without governmental interference."

You are talking about the former, which none of these cases were about. They are all about the latter.

So this is very far afield from a general right to privacy of the kind we are talking about, and more importantly, one that would cover anything like OpenAI chats.

So basically, you have a ~200 year period where it was not considered a right, and then a 50 year period where specific forms of privacy were considered a right, and now we are just about back to the former.

The kind of privacy we are talking about here ("the right to shield information from disclosure") has always been subject to a balancing of interests made by legislatures, rather than a constitutional right upon which they may not infringe. Example abound - you actually don't have to look any further than court filings themselves, and when you are allowed to proceed anonymously or redact/file things under seal. The right to public access is considered much stronger than your right to not want the public to know embarassing or highly private things about your life. There are very few exceptions (minors, etc).

Again, i don't claim any of this is how it is should be. But it's definitely how it is.

sib 1 day ago

I'd like to thank you for explaining this so clearly (and for "providing receipts," as the cool kids say).

>> Again, i don't claim any of this is how it is should be. But it's definitely how it is.

Agreed.

HardCodedBias 1 day ago

"Dobbs. They also explicitly stated a constitutional right to privacy does not exist"

I did not know this, thank you!

krapp 1 day ago

¯\_(ツ)_/¯ The supreme court overturned Roe v. Wade in 2022 and explicitly stated in their ruling that a constitutional right to privacy does not exist.

DannyBee 1 day ago

Yes. They went further and explicitly make the distinction between the kind of privacy we are talking about here ("right to shield information from disclosure"), and the kind they saw as protected in griswold, lawrence, and roe ("right to make and implement important personal decisions without governmental interference").

shkkmo 1 day ago

> It has, fwiw, also historically not been seen as a core right for thousands of years. So i think it's a harder argument to make than you think despite the EU coming around on this.

This doesn't seem true. I'd assume you know more about this than I do though so can you explain this in more detail? The concept of privacy is definitely more than thousands of years old. The concept of a "human right", is arguably much newer. Do you have particular evidence that a right to privacy is a harder argument to make that other human rights?

While the language differs, the right to privacy is enshrined more or less explicitly in many constitutions, including 11 USA states. It isn't just a "european" thing.

static_motion 1 day ago

I understand what they mean. There's this great video [1] which explains it in better terms than I ever could. I've timestamped the link because it's quite long, but if you've got the time it's a fantastic video with a great narrative and presentation.

[1] https://youtu.be/Fzhkwyoe5vI?t=4m9s

dragonwriter 1 day ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Because the law favors preservation of evidence for an active case above most other interests. It's not a matter of arbitrary preference by the particular court.

piombisallow 1 day ago

Regardless of the details of this specific case, the courts are not democratic and do not decide based on the interest of the parties or how many they are, they decide based on the law.

brookst 1 day ago

This is not true even in the slightest.

The law is not a deterministic computer program. It’s a complex body of overlapping work and the courts are specifically chartered to use judgement. That’s why briefs from two parties in a dispute will often cite different laws and precedents.

For instance, Winter v. NRDC specifically says that courts must consider whether an injunction is in the public interest.

piombisallow 1 day ago

"public interest" is a much more ambiguous thing than the written law

otterley 1 day ago

Yes. And, that's why both sides will make their cases to the court as to whether the public interest is served by an injunction, and then the court will make a decision based on who made the best argument.

resource_waste 2 days ago

>Privacy right is an integral part of the freedom of speech, a core human right.

Are these contradictory?

If you overhear a friend gossiping, can't you spread that gossip?

Also, where are human rights located? I'll give you a microscope.(sorry, I'm a moral anti-realist/expressivist and I can't help myself)

152132124 1 day ago

I think you will have a better time arguing with a LLM

oersted 1 day ago

I completely agree with you. But perhaps we should be more worried that OpenAI or Google can retain all this data and do pretty much what they want with it in the first place, without a judge getting into the picture.

wat10000 1 day ago

ChatGPT isn’t like an ISP here. They are being credibly accused of basing their entire business on illegal activity. It’s more like if The Pirate Bay was being sued. The alleged infringement is all they do, and requiring them to preserve records of their users is pretty reasonable.

fireflash38 2 days ago

In your arguments for privacy, do you consider privacy from OpenAI?

rvnx 2 days ago

Cut a joke about ethics and OpenAI

maest 1 day ago

Original comment, lest the conversation chain does not make sense

> Sam Altman is the most ethical man I have ever seen in IT. You cannot doubt he is vouching and fighting for your privacy. Especially on YCombinator website where free speech is guaranteed.

ethersteeds 2 days ago

He is what now?! That is a risible claim.

nindalf 2 days ago

He was being facetious.

ethersteeds 2 days ago

Alas, it was too early

humpty-d 2 days ago

I fail to see how saving all logs advances that cause

hshdhdhj4444 2 days ago

Because this is SOP in any judicial case?

Openly destroying evidence isn’t usually accepted by courts.

brookst 1 day ago

Is there any evidence of import that would only be found in one single log among billions? The fact that NYT thinks that merely sampling 1% of logs would not support their case is pretty damning.

fluidcruft 1 day ago

I don't know anything about this case but it has been alleged that OpenAI products can be coaxed to return verbatim chunks of NYT content.

brookst 1 day ago

Sure, but if that is true, what is the evidentiary difference between preserving 10 billion conversations and preserving 100,000 and using sampling and statistics to measure harm?

fluidcruft 1 day ago

The main differences seem to be that it doesn't require the precise form of the queries to be known a priori and that it interferes with the routine destruction of evidence via maliciously-compliant mealy-mouthed word games, for which the tech sector has developed a significant reputation.

Furthermore there is no conceivable harm resulting from requiring evidence to be preserved for an active trial. Find a better framing.

ToValueFunfetti 1 day ago

No conceivable harm in what sense? It seems obvious that it is harmful for a user who requests and is granted privacy to then have their private messages delivered to NYT. Legally it may be on shakier ground from the individual's perspective, but OpenAI argues that the harm is to their relationship with their customers and various governments, as well as the cost of the implementation effort:

>For OpenAI, risks of breaching its own privacy agreements could not only "damage" relationships with users but could also risk putting the company in breach of contracts and global privacy regulations. Further, the order imposes "significant" burdens on OpenAI, supposedly forcing the ChatGPT maker to dedicate months of engineering hours at substantial costs to comply, OpenAI claimed. It follows then that OpenAI's potential for harm "far outweighs News Plaintiffs’ speculative need for such data," OpenAI argued.

sib 1 day ago

>> It seems obvious that it is harmful for a user who requests and is granted privacy to then have their private messages delivered to NYT.

This ruling is about preservation of evidence, not (yet) about delivering that information to one of the parties.

If judges couldn't compel parties to preserve evidence in active cases, you could see pretty easily that parties would aggressively destroy evidence that might be harmful to them at trial.

There's a whole later process (and probably arguments in front of the judge) about which evidence is actually delivered, whether it goes to the NYT or just to their lawyers, how much of it is redacted or anonymized, etc.

freejazz 16 hours ago

The number of times that OpenAI is producing verbatim copies of NYT's articles... for one. That wasn't so hard to think of.

baobun 2 days ago

It's a honeypot from the beginning y'all

trod1234 1 day ago

It doesn't, it favors longstanding caselaw and laws already on the books.

There is a longstanding precedent with regards to business document retention, and chat logs have been part of that for years if not decades. The article tries to make this sound like this is something new, but if you look at the e-retention guidelines in various cases over the years this is all pretty standard.

For a business to continue operating, they must preserve business documents and related ESI upon an appropriate legal hold to avoid spoliation. They likely weren't doing this claiming the data was deleted, which is why the judge ruled in favor against OAI.

This isn't uncommon knowledge either, its required. E-discovery and Information Governance are things any business must meet in this area; and those documents are subject to discovery in certain cases, where OAI likely thought they could avoid it maliciously.

The matter here is OAI and its influence rabble are churning this trying to do a runaround on longstanding requirements that any IT professional in the US would have reiterated from their legal department/Information Governance policies.

There's nothing to see here, there's no real story. They were supposed to be doing this and didn't, were caught, and the order just forces them to do what any other business is required to do.

I remember an executive years ago (decades really), asking about document retention, ESI, and e-discovery and how they could do something (which runs along similar lines to what OAI tried as a runaround). I remember the lawyer at the time saying, "You've gotta do this or when it goes to court you will have an indefensible position as a result of spoliation...".

You are mistaken, and appear to be trying to frame this improperly towards a point of no accountability.

I suggest you review the longstanding e-discovery retention requirements that courts require of businesses to operate.

This is not new material, nor any different from what's been required for a long time now. All your hyperbole about privacy is without real basis, they are a company; they must comply with law, and it certainly is not outrageous to hold people who break the law to account, and this can only occur when regulatory requirements are actually fulfilled.

There is no argument here.

References: Federal Rules of Civil Procedure (FRCP) 1, 4, 16, 26, 34, 37

There are many law firms who have written extensively on this and related subjects. I encourage you to look at those too.

(IANAL) Disclosure: Don't take this as legal advice. I've had the opportunity to work with quite a few competent ones, but I don't interpret the law; only they can. If you need someone to provide legal advice seek out competent qualified counsel.

rolandog 1 day ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Can't you use the same arguments against, say, Copyright holders? Billionaires? Corporations doing the Texas two-step bankruptcy legal maneuver to prevent liability from allegedly poisoning humanity?

I sure hope so.

Edit: ... (up to a point)

deadbabe 1 day ago

OpenAI is a business selling a product, it’s not a decentralized network of computers contributing spare processing power to run massive LLMs. Therefore, you can easily point a finger at them and tell them to stop some activity for which they are the sole gatekeeper.

cactusplant7374 1 day ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

It simply didn't. ChatGPT hasn't deleted any user data.

> "OpenAI did not 'destroy' any data, and certainly did not delete any data in response to litigation events," OpenAI argued. "The Order appears to have incorrectly assumed the contrary."

It's a bit of a stretch to think a big tech company like ChatGPT is deleting users' data.

blackqueeriroh 1 day ago

This is incorrect. As someone who has had the opportunity to work in several highly=regulated industries, companies do not want to hold onto extra data about you that they don’t have to unless their business is selling that data.

OpenAI already has a business, and not one they want to violate by having a massive amount of customer data stolen if they get hacked.

bigyabai 19 hours ago

ChatGPT is not SOC2 compliant. They are not being regulated or audited by anyone that could prove your point.

cactusplant7374 1 day ago

The article and OpenAI themselves contradict you. Do you work at OpenAI?

huijzer 1 day ago

> Why could a court favor the interest of the New York Times in a vague accusation versus the interest and right of hundred millions people?

Well maybe some people in power have pressured the court into this decision? The New York Times surely has some power as well via their channels