Definition of plagiarism, by the Cambridge Dictionary:
"the process or practice of using another person's ideas or work and pretending that it is your own"
What I am objecting to is the "another person's" part. An LLM is not a person, it is a tool -- a tool that is trained on other people's work, yes.
If you use a different tool like DeepL, which is also trained on other people's work, to produce text purely from an original prompt you give it (i.e. translate something you wrote yourself), and you put that into your paper... is that then plagiarism as well? If not, what if you use an LLM to do the translation instead, instructing it to act strictly as a 'translation tool'?
It seems to me, the mere act of directly copying the output of an LLM into your own work without a reference cannot be considered plagiarism (in every case), unless LLMs are considered people.
Of course, you can prompt an LLM in a way that copying its output would _definitely_ be plagiarism (i.e., asking it to quote the Declaration of Independence verbatim, and then simply copying that).
So, all I'm saying is: The distinction is not that clear, has nuances, and depends on the context.
By your argument, since an encyclopedia is not a person, I can copy it with impunity. It's a collection of work built on others' ideas and research, but technically a tool to bring it together. I can assure you that virtually any school would consider the direct use of it, without citation, plagiarism.
Let's assume I used an encyclopedia outside of my native tongue. I took the passage verbatim, used a tool to translate it to my native tongue, and passed it off as my own. The translation tool is clearly not a person, and I've even transformed the original work. I might escape detection, but this is still plagiarism.
Do you not agree?
Let's go to how Cambridge University defines it academically:
> Plagiarism is defined as the unacknowledged use of the work of others as if this were your own original work.
> A student may be found guilty of an act of plagiarism irrespective of intent to deceive.
And let's go to their specific citation for the use of AI in research:
> AI does not meet the Cambridge requirements for authorship, given the need for accountability. AI and LLM tools may not be listed as an author on any scholarly work published by Cambridge
> By your argument, since an encyclopedia is not a person, I can copy it with impunity.
I don’t see where they said (or implied) that.
How does “that isn’t plagiarism” imply “I can copy it with impunity”? Copyright infringement is still a thing.
Have you conflated plagiarism with copyright infringement? Neither implies the other. You can plagiarize without committing copyright infringement, and you can violate copyright without plagiarism.
I'm sorry, but this encyclopedia analogy really doesn't say anything at all about the argument I raised. An encyclopedia is the work of individual authors, who compiled the individual facts. It is not a tool that produces text based entirely on the prompt you give it. Using an encyclopedia's entries (translation or not) without citing the source is plagiarism, but that doesn't have any parallel to using an LLM.
(Also, the last quote you included seems to directly support my argument)
The translation software isn't a person. It will necessarily take liberty with the source material, possibly even in a non-deterministic fashion, to translate it. Why would it be any different from a LLM as a tool in our definition of plagiarism?
If I used a Markov Chain (arguably a very early predecessor to today's models) trained on relevant data to write the passage, would that be any different? What about a RNN? What would you qualify as the threshold we need to cross for the tool to not be to be plagiarism?
when did he imply that a LLM would be different as a tool than a translator in his definition of plagiarism? are you even understanding his points lmao?
There's nuances to the amount of harm dealt to the authors based on what sources you are stealing from, but it's irrelevant here, as the specific incident we're talking about is whether or not the student is the actual author of the work submitted.
It'd be the same as if I had Google Translate do my German 101 exam. I even typed the word "germuse" with my own two thumbs!
What we are talking about in this sub-thread is exclusively the 'this is clearly plagiarism' part.
If you used Google Translate for your German 101 exam, that would be academic dishonesty/cheating, but not plagiarism.
I'm largely uninterested in the specific name you want to give it and more if its worthy of punishment.
> What I am objecting to is the "another person's" part.
Fair enough. We disagree about definitions here. To me, plagiarizing is claiming authorship of a work that you did not author. Where that work came from is irrelevant to the question.
> If not, whatif you use an LLM to do the translation instead, instructing it to act strictly as a 'translation tool'?
Translation is an entirely different beast, though. A translation is not claiming to be original authorship. It is transparently the opposite of that. Nobody translating a work would claim that they wrote that work.
> Fair enough. We disagree about definitions here. To me, plagiarizing is claiming authorship of a work that you did not author. Where that work came from is irrelevant to the question.
This is exactly what it is ... the post is taking "another person's" waaaay to literally - especially given that we are in the year of our Lord 2024/2025. One of the author's comments above is also discarding Encyclopedia argument stating that they are written by people which cannot ever be factually proven (I can easily ask LLM to create an Encyclopedia and publish it). Who is "another person" on a Wikipedia page?! "bunch of people" ... how is LLM trained? "bunch of people, bunch of facts, bunch of ____"
The crux of this whole "argument" isn't that plagiarism is "another person's work" it is that you are passing work as YOURS that isn't YOURS - it is that simple.
Well, I understand, and I suspect that a lot of people commenting here see the term similarly to you; but there's an official definition regardless of your personal interpretation, and it does include the 'somebody else's work' part.
Why is translation a different beast? It produces text based on a prompt you give it, and it draws from vast amounts of the works of other people to do so. So if a translation tool does not change the 'authorship' of the underlying text (i.e., if it would have been plagiarism to copy the text verbatim before translating it, it would be plagiarism after; and the same for the inverse), then it should also be possible for an LLM to not change the authorship between prompt and output. Which means, copying the output of an LLM verbatim is not necessarily in itself plagiarism.
> but there's an official definition regardless of your personal interpretation, and it does include the 'somebody else's work' part.
No, it doesn't. First of all, dictionaries aren't prescriptive and so all quoting a definition does is clarify what you mean by a word. That can be helpful toward understanding, of course.
That said, the intransitive verb form of the word does not require "somebody else's work" in the sense of that "someone else" being a human.
> to commit literary theft : present as new and original an idea or product derived from an existing source
-- Merrian-Webster https://www.merriam-webster.com/dictionary/plagiarizeAccording to this, what it means is taking credit for a work you did not produce. That work did not have to be produced by a human, it merely had to exist.
> Why is translation a different beast?
Because it doesn't produce a new work, it just changes the language that work is expressed in. "Moby Dick" is "Moby Dick" regardless of what language it has been translated to. This is why the translator (human or otherwise) does not become the author of the work. If you were to run someone else's novel through a translator and claimed you wrote that work, you would in every respect be committing plagiarism both by the plain meaning of the word and legally.
> copying the output of an LLM verbatim is not necessarily in itself plagiarism.
Yes, it is. You would be taking credit for something you did not author. You would be doing the same if you took credit for a translation of someone else's work.