-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index files other than text/document files #21
Comments
Does anyone know where the logic is that determines what files are indexed? I just took a quick look and couldn't find it. I noticed that markdown files aren't indexed, and I figured that one would be an easy fix (just treat it like a .txt), but I couldn't find the place where it reads file extensions. |
I am not sure if this is the right approach but here are some I found.
This way, application/xml and .drawio files are included for indexing. .drawio files need a bit more extraction process for they are deflated xml. Anyway, I have somehow done indexing .xml and .drawio files. My blog article on the issue is here |
Hi.
I'm far from grasping the complexity of ES and the NC's
fulltextsearch
suite, but: I thought that theIngest Attachment Processor Plugin
that we add to ElasticSearch aims at indexing virtually any known type of file, thanks toApache Tika
that knows how to parse hundreds and hundreds of file types.Despite that, it seems to me like
files_fulltextsearch
provides ES with the content of files only when they match the following types: Text, Office, PDF.And indeed, I've installed and configured
files_fulltextsearch
on a local NextCloud instance for tests purposes, and I don't seem to be able to search within the content of ZIP files, Image files, etc. AlthoughTika
knows these file types.Isn't it possible to just send all file contents to ES so it indexes as many file types as it can?
Thx.
The text was updated successfully, but these errors were encountered: