Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.docx files are wrongly identified as application/zip #282

Open
bigabig opened this issue Dec 3, 2023 · 1 comment
Open

.docx files are wrongly identified as application/zip #282

bigabig opened this issue Dec 3, 2023 · 1 comment
Assignees
Labels
backend This issue is related to the backend bug Something isn't working

Comments

@bigabig
Copy link
Member

bigabig commented Dec 3, 2023

Internally, we use magic to identify the mime type of uploaded files. magic is a wrapper around the "file" library.

The file library on our lt servers (including ltdwise and hcdsgpu2) incorrectly detects docx files as application/zip.

We should fix this. I found this on stackoverflow https://serverfault.com/questions/338087/making-libmagic-file-detect-docx-files/377792#377792 but I am not sure if this is a proper fix.

@bigabig bigabig added backend This issue is related to the backend bug Something isn't working labels Dec 3, 2023
@floschne
Copy link
Member

floschne commented Dec 7, 2023

it just occurred to me that this should actually be a frontend thing. Because, AFAIK, the mime type is set by the browser and packed into the header of the request so that the other side knows how to interpret the bytes sent. At the corresponding FastApi endpoint, Starlett converts the request into an uploaded file and simply reads the mime type.

Later in the PrePo pipeline, however, it may be that Magic determines the mime type incorrectly from File, because docx files are actually ZIP files that contain a lot of XMLs etc.. But the file should actually be placed in the correct pipeline in the backend API worker, where Magic is no longer used at all

Do you know where this fails?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend This issue is related to the backend bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants