Do I see this right, you expect people to submit their scientific research papers, and there is zero information on who you are, how to contact you, what happens with the uploaded data / any privacy policy, and so on...
(except your github usernames on the repo posted only here)
Regardless of how useful this is it's hard to take it serious.
Fair point.
We're in very early MVP mode, trying to move fast and see if this works. We pushed a Cloud version to support users who don't want to run the GitHub script themselves. That said, you're absolutely encouraged to run it yourself (with your openAI key) — the results are identical.
For context: we're two recent ETH Zurich PhD graduates.
Robert Jakob: https://www.linkedin.com/in/robertjakob Kevin O'Sullivan: https://www.linkedin.com/in/kevosull
Going to add contact information immediately.
Thanks again for the feedback — it's exactly what we need at this stage.
How can it be run "locally" if you don't support local-hosted LLMs? The overlap between people who wouldn't trust a cloud api wrapper like yours, but would willingly let their (possibly sensitive) documents be sent to some AI provider's api seems rather small to me. Either embrace the cloud fully and don't worry about data confidentiality, or go full local and embrace the anxious community. This in between seems like a waste of time tbh.
(I'm not trying to sound overly critical - I very much like the idea and the premise. I merely wouldn't use this business approach)
> This in between seems like a waste of time tbh.
Hard disagree. The “in between” is where you want where most are already ending up. Initially you had everyone so worried about privacy and what OpenAI is doing with their precious private data. “They will train on it. Privacy is important to me. I’m not about to like give OpenAI access to my private, secure, Google drive back ups or Gmail history or Facebook private messages or any real private “local only” information.
Also among those who understand data privacy concerns, when it come to work data, in the span of 2-3 years, all business folks I know went from “this is confidential business information. Please never upload to ChatGPT and only email it to me” to “just put everything on ChatGPT and see what it tells you”
The initial worry was driven by not understanding how LLMs worked. What if “it just learned as you talked to it?” And “what if it used that learning with somebody else?” Like I told it a childhood secret, will it turn around and tell others my secret?”
People understand how that works now and some concerns are less. Basically most understand that it’s similar risk as their already existing digital life is
As someone who actually deals with this on a regular basis, I can guarantee you that serious companies definitely do not "just put everything in ChatGPT" if they have any sort of respectable legal department. Especially in Europe, where you have GDPR concerns on top of any business concerns. People who actually understand the privacy issues nowadays either use stuff like Azure's OpenAI custom hosting to be compliant with the law or go full open weight self hosted. Everything else is a legal time-bomb.
Of course they aren't putting it on ChatGPT. Their data is stored in S3, Snowflake, BigQuery, or Azure Storage. It makes more sense to use the respective cloud provider LLM hosting service. You can use OpenAI's GPT models or Anthopic models hosted on Azure or AWS.
You're telling me companies in Europe aren't putting all their user data on AWS and Azure regions in Europe? Both AWS and Azure are gigantic in Europe.
Supporting local models can be done by overriding one or two environment variables, as long as your local inference server has an OpenAI-compliant endpoint (which the majority of local stacks ship with).
Was there some level of support beyond this that you were referring to?
Good point. Current focus is on improving AI feedback quality, not business model. But we’ll definitely consider local model support for privacy-conscious users. Thanks for the input!
Even extremely privacy-conscious authors could submit their paper to the service at the same time they publish their preprint v1, then if the service's feedback is useful, publish preprint v2 and submit v2 as the version of record.
...or run it themselves. The code is open source: https://github.com/robertjakob/rigorous
Note: The current version uses the OpenAI API, but it should be adaptable to run on local models instead.
It seems you have the option to run the tools yourself (with an OpenAI API key). The cloud version is for convenience. I agree that a privacy/usage policy is necessary.
I agree. I've worked at a national lab before and I immediately thought this service is a massive security risk. It will definitely be hard for some scientists to use these kind of cloud services, especially if their research truly is cutting edge and sensitive. I think many people will just ignore things like this because they want to keep their jobs, etc.
As mentioned above, there is an open-source version for those who want full control. The free cloud version is mainly for convenience and faster iteration. We don’t store manuscript files longer than necessary to generate feedback (https://www.rigorous.company/privacy), and we have no intention of using manuscripts for anything beyond testing the AI reviewer.
Just upload an already published paper to test it
Cool! We'll get back asap.
We'd be happy to hear what kind of feedback you find useful, what is useless, and what you would want in an ideal review report. (https://docs.google.com/forms/d/1EhQvw-HdGRqfL01jZaayoaiTWLS)