Detect type, extract text and metadata from different file type. Similar to Tika Project but in Golang.
List of formats read:
Format | FileParser | MIME Type | Metadata |
---|---|---|---|
TXT | X | text/plain; charset=utf-8 | |
RTF | X | text/rtf | |
DOC (partial) | X | application/x-ole-storage | |
ODT | X | application/vnd.oasis.opendocument.text | X |
DOCX | X | application/vnd.openxmlformats-officedocument.wordprocessingml.document | X |
PPTX | X | application/vnd.openxmlformats-officedocument.presentationml.presentation | X |
X | application/pdf | X |