-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Labels
pluginShould probably be an Atomic PluginShould probably be an Atomic Plugin
Description
Being able to search inside the PDF files uploaded to Atomic Server would be a really nice addition.
Goals:
- Make it easier to find PDF documents by searching for terms that occur inside them
- Lightweight
- Fast
- Runs in background, may fail. Should not slow down upload process.
- OCR, if missing in the original PDF, would be a decent addition. But only if other goals are met.
- Bonus points if it also turns other doc types (e.g. docx) to plaintext
- Output should be plaintext or (preferably) markdown
Non-goals:
- Extract data from tables in PDFs
There are some tools that could help with this:
- pdf-extract rust crate
- ooxml-rs openXML (.docx .xlsx .pptx / word powerpoint excel) rust parser
- pdf-to-markdown (JS, so should run client-side!)
Metadata
Metadata
Assignees
Labels
pluginShould probably be an Atomic PluginShould probably be an Atomic Plugin