godelski 5 days ago

I'm also a PhD[0] and researcher who has worked in various fields, including national labs too (You DOE?)

I mostly share the same sentiment, and I see a similar issue with the product. The current system is not in its current poor state due to lack of reviewers, it is due to lack of quality reviewers and arbitrary notions of "good enough for this venue." So I wanted to express a difference of opinion about what peer review should be (I think you'll likely agree).

I don't think we are doing the scientific community any service by doing our current Conference/Journal based "peer review". The truth is that you cannot verify a paper by reading it. You can falsify it, but even that is difficult. The ability to determine novelty and utility is also a crapshoot, where we have a long history illustrating how bad we are at this. Several Nobel prize worthy works have been rejected multiple times due to "obviousness", "lack of novelty", and "clearly wrong." All three apply to the paper that led to the 2001 Nobel Prize in Economics[1]!

The truth of the matter is that peer review is done in the lab. It is done through replication, reproduction, and the further development of ideas. What we saw around LK-99[2] was more quality and impactful peer review than what any reader for a venue could provide. The impact existed long before any of those works were published in venues.

I think this came down to forgetting the purpose of journals. They were there when we didn't have tools like ArXiV, OpenReview, or even GitHub. Journals were primarily focused on solving the logistic problem of distribution. So I consider all those technical works, "preprints", and blog posts around LK-99 replications as much of a publication as anything else. The point is that we are communicating with our peers. There's always been prestige around certain venues, but primarily people did not publish to them. The other venues checked for plagiarism, factual errors, and any obvious errors. Otherwise, they continued with publication.

This silly notion of acceptance rates just creates a positive feedback loop which is overburdening the system (highly apparent in ML conferences). The notions of novelty and impact are highly noisy (as demonstrated in multiple NeurIPS studies and elsewhere), making the process far more random than acceptable. I don't think this is all that surprising. It is quite easy to poke flaws in any work you come across. It does not take a genius to figure out limitations of works (often they're explicitly stated!).

The result of this is obvious, and is what most researchers end up doing: resubmit elsewhere and try your luck again. Maybe the papers are improved, maybe they aren't, mostly the latter. The only thing this accomplishes is an exponentially increasing number of paper submissions and slowing down of research progress as we spend time reformatting and resubmitting which should instead be spent researching. The distribution of quality review comments seems to have high variance, but I can say that early in my PhD they mainly resulted in me making my works worse as I chased their comments rather than just trying to re-roll and move on.

In this sense, I don't think there's a "lack of reviewer" problem, so much as we have an acceptance threshold problem with an arbitrary metric. I think we should check for critical errors, check for plagiarism, and then just make sure the work is properly communicated. The rest is far more open to interpretation and not even us experts are that good at it.

[0] Well my defense is in a week...

[1] https://en.wikipedia.org/wiki/The_Market_for_Lemons

[2] https://en.wikipedia.org/wiki/LK-99

1
atrettel 5 days ago

I worked at LANL until very recently, so yes, I was associated with the DOE.

I actually agree with your point that "the ability to determine novelty ... is a crapshoot". My point was that the AI system should at least try to provide some sense of how novel the content is (and what parts are more novel than others, etc.). This is important for other review processes like patent examination and is certainly very important for journal editors to determine whether a manuscript it "worthy" of publication. For these reasons, I personally have a low bar as to what qualifies as "novel" in my own reviews.

Most of my advisors in graduate school were also journal editors, and they instilled on me to focus on novelty during peer reviews because that is what they cared about most when making a decision about a manuscript. Editors focus on novelty because journal space is a scarce resource. You see the same issue in the news in general [1]. This is one of the reasons why I have a low bar to evaluate novelty, because a study can be well done and cover new ground without having an unambiguous conclusion or "story being told" (which is something editors might want).

I originally discussed this briefly in my post but edited it out immediately after posting this. I'll post it again but add more detail. I think that a lot of peer review as practiced today is theater. It doesn't really serve any purpose other than providing some semblance of oversight and review. I agree with your point about the journal/conference being the wrong place to do peer review. It is too late to change things by then. The right time is "in the lab", as you say.

I wholeheartedly agree that reproduction/replication is the standard that we should seek to achieve but rarely ever do. Perhaps the only "original" ideas that I have had in my career came from trying to replicate what other people did and finding out something during that process.

[1] https://en.wikipedia.org/wiki/News_values

godelski 4 days ago

Nice, I never went to LANL but have a few friends in HPC over there.

You're right, it is theater. But a lot of people think it isn't...

I think it is important to be explicit in why novelty is a crapshoot.

  Novelty depends on:
    - how well you read the work
      - High level reading means you will think x is actually y
    - how well read you are
      - If you're too well read, every x is just y
      - If you're not well read, everything is novel
    - how clear the writing was
      - If it is too clear, it is obvious, therefore not novel
If any process encourages us to be less clear in writing, we should reject it. I've seen a lot of this happening more and more and it is terrible for science. You shouldn't have to mask your contributions, oversell, or mask other related works. Everything is "incremental" and all that novelty is is a measurement of the reader's ego.

What I've just seen is that the old guard lost sight of what was important: communicating. I don't think anyone is malicious here or even had bad intentions. In fact, I think everyone had and still has good intentions. But good intentions don't create good outcomes. They're slow boiled frogs boiled, with slowly increasing dependence on metrics. They can look back and say "it worked for me", blinding them to how things have changed.

  > I agree with your point about the journal/conference being the wrong place to do peer review. It is too late to change things by then. The right time is "in the lab", as you say.
I disagree a bit (again, I think you'll agree lol). You're right that some should be happening in the lab. But there is a hierarchy. The next level is outside the lab. Then outside research. Peer review is an ongoing process that never stops. To define it as 3-4 people quickly reading a paper is just laughable. They just have all incentives to reject a work. No one questions you when you reject, but they do when you accept. Acceptance rates sure don't help, and this is the weirdest metric to define "impact" by. I don't even know how one could claim that rejection rate correlates with scientific impact. Maybe only through the confounding variable of prestige and that it is what people target? But then ArXiv should have the highest impact lol.

  > Perhaps the only "original" ideas that I have had in my career came from trying to replicate what other people did and finding out something during that process.
Same! I don't think it is a coincidence either. Science requires us to be a bit antiauthoritarian. Trust, but verify is a powerful tool. We need to verify in different environments, with methods that should be similar, and all that. Finding those little holes is critical. A worst, replication makes you come up with ideas. At least if you keep asking "why did they do this?" or "why does that happen?".

I think in a process where we're pushed to quickly publish we do not take time to chase these rabbit holes. Far too often there's a wealth of information down them. But I'm definitely also biased from my poor experience in grad school lol.