You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The extensions .mdx and .markdown are being transformed to FileType.UNK when being passed to unstructured.file_utils.filetype.detect_filetype
To Reproduce
from unstructured.file_utils.filetype import detect_filetype
print(detect_filetype("file.mdx"))
print(detect_filetype("file.markdown"))
Expected behavior
The expected behavior should be either to have them go into a FileType.MDX respectively FileType.MARKDOWN (just like XLS XLSX) or at least have them be FileType.MD
The text was updated successfully, but these errors were encountered:
@butasebi I don't think the .mdx is going to work out because that format cannot be parsed by the python-markdown package we use. It's really a hybrid format.
But a file with .markdown should identify as such. I'm currently getting FileType.TXT for that case, where the same file with a .md extension is correctly identified. I'll have a closer look at why that is.
Describe the bug
The extensions .mdx and .markdown are being transformed to FileType.UNK when being passed to unstructured.file_utils.filetype.detect_filetype
To Reproduce
from unstructured.file_utils.filetype import detect_filetype
print(detect_filetype("file.mdx"))
print(detect_filetype("file.markdown"))
Expected behavior
The expected behavior should be either to have them go into a FileType.MDX respectively FileType.MARKDOWN (just like XLS XLSX) or at least have them be FileType.MD
The text was updated successfully, but these errors were encountered: