How do you know how accurate you are? How do you know when you're wrong?
If I'm being entirely honest, in the general case I don't.
But I don't particularly care, either. After a couple tries I decided it's better not to point at object examples of suspected LLM text all the time (except e.g. to report it on Stack Overflow, where it's against the rules and where moderators will use actual detection software etc. to try to verify). But I still notice that style of writing instinctively, and it still automatically flips a switch in my brain to approach the content differently. (Of course, even when I'm confident that something was written by a human, I still e.g. try to verify terminal commands with the man pages before following instructions I don't understand.)
Of course, AI writes the way it does for a reason. More worryingly, it increasingly seems like (verifiably) human writers are mimicking the style - like they see so much AI-generated text out there that sounds authoritative, that they start trying to use the same rhetorical techniques in order to gain that same air of authority.
> still notice that style of writing instinctively, and it still automatically flips a switch in my brain
See, this is what worries me. We have unknowable years of instinct, and none of it is tuned for what is happening now.
I think this is an excellent question and one people should be asking themselves frequently. I often get the impression that commenters have not considered this.
For example, whenever someone on the internet makes a claim about "most x", e.g. most people this, most developers that. What does anyone actually know about "most" anything? I think the answer "pretty much nothing".
Yes, this is an important point. Insert the survivorship bias plane picture that always gets posted when someone makes this mistake on other platforms (Twitter). We can be accurate at detecting poor AI writing attempts, but not know how much AI writing is good enough to go undetected.
Someone should run a double blind test app, there was an adversarially crafted one for images and still got 60% or so average accuracy. We all just can glance the data and detect AI generation like how some experts can just let logs run and say something.