Support for custom extensions #14

skin27 · 2017-03-07T15:10:21Z

I would like to scan custom extensions as well. I work a lot with structured documents like .csv, .xml, .json etc. These could be scanned like normal text files.

mirkosertic · 2019-05-11T19:48:59Z

Ah, a good requirement! Yet, what about document metadata? I don't thing authors can be extracted from the files, the only viable information would be the last modified date and the extracted content language. Maybe the new NLP features might find some named entities, but I don't think there are more options here. What do you think?

mlt · 2019-05-29T23:37:33Z

One can extend Tika to extract metadata if those xml, json, etc have a certain structure and contain necessary information.
Since there is always going to be someone who says I miss extension X, I wonder if it would make sense to use patterns for things to scan somehow?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for custom extensions #14

Support for custom extensions #14

skin27 commented Mar 7, 2017

mirkosertic commented May 11, 2019

mlt commented May 29, 2019

Support for custom extensions #14

Support for custom extensions #14

Comments

skin27 commented Mar 7, 2017

mirkosertic commented May 11, 2019

mlt commented May 29, 2019