brudgers 6 days ago

https://linux.die.net/man/1/pdftotext

is the simplest thing that might work.

It is free and mature.

1
jbaiter 5 days ago

That will not work for scanned PDFs without a text layer and even if it has one, it's not guaranteed to work.

brudgers 5 days ago

"Might work" comes with neither express nor implied warranty.

OCR is another thing that might work which is also simpler than an LLM.