You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The following line tried to detect whether libmagic has been installed, but it actually only detects the installation of python-magic package instead of the true libmagic dependency.
Therefore, as long as python-magic is installed, LIBMAGIC_AVAILABLE will always be True, and if libmagic is not installed, import magic will cause an error.
The text was updated successfully, but these errors were encountered:
**Summary**
Fixes a bug where a CSV file with asserted content-type
`application/vnd.ms-excel` was incorrectly identified as an XLS file and
failed partitioning.
**Additional Context**
The `content_type` argument to partitioning is often authored by the
client system (e.g. Unstructured SDK) and is both unreliable and outside
the control of the user. In this case the `.csv -> XLS` mapping is
correct for certain purposes (Excel is often used to load and edit CSV
files) but not for partitioning, and the user has no readily available
way to override the mapping.
XLS files as well as seven other common binary file types can be
efficiently detected 100% of the time (at least 99.999%) using code we
already have in the file detector.
- Promote this direct-inspection strategy to be tried first.
- When DOC, DOCX, EPUB, ODT, PPT, PPTX, XLS, or XLSX is detected, use
that file-type.
- When one of those types is NOT detected, clear the asserted
`content_type` when it matches any of those types. This prevents the
problem seen in the bug where the asserted content type was used to
determine the file-type.
- The remaining content_type, guess MIME-type, and filename-extension
mapping strategies are tried, in that order, only when direct inspection
fails. This is largely the same as it was before.
- Fix#3781 while we were in the neighborhood.
- Fix#3596 as well, essentially an earlier report of #3781.
Describe the bug
The following line tried to detect whether
libmagic
has been installed, but it actually only detects the installation of python-magic package instead of the truelibmagic
dependency.unstructured/unstructured/file_utils/filetype.py
Line 58 in acd070c
Therefore, as long as python-magic is installed, LIBMAGIC_AVAILABLE will always be True, and if libmagic is not installed, import magic will cause an error.
The text was updated successfully, but these errors were encountered: