Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for streams #97

Open
apolkosnik-old opened this issue Nov 5, 2015 · 6 comments
Open

Support for streams #97

apolkosnik-old opened this issue Nov 5, 2015 · 6 comments

Comments

@apolkosnik-old
Copy link

I'd like to propose a feature that removes reliance on file extensions, and brings a much greater flexibility for users by accepting streams as input to textract.

@deanmalmgren
Copy link
Owner

Sounds like a really interesting idea. Would you like to propose the command line interface for what that could look like?

My main concern is for deciding how we route the inbound content to the appropriate parser. FWIW, I recently took a stab at using mimetypes to detect the type of parser that we should use (see #89), which had pretty 💩-y results

@apolkosnik-old
Copy link
Author

My approach to the poor results is to try to run through all possible extensions for the given mimetype until one sticks. It's a bit crude, but seems to work with couple of files that I've tested.

@apolkosnik-old
Copy link
Author

I've created a pull #99, I'd love some feedback. Thanks!

@josepablog
Copy link

Hello! Is this idea still being pursued? I have a use case where this would be very useful :}~

@frbapolkosnik @deanmalmgren

@kennell
Copy link

kennell commented Jul 31, 2017

I would also like to see this happen 👍

@filipopo
Copy link

filipopo commented Mar 14, 2020

The output from the console is that it needs to be a String, Bytes, ... but this is a generic message, so the underlying tools support Bytes/Streams
I was hoping eg
process(file.Read(),extension="txt")
or whatever would work, but I see there's also requests to auto-detect the extension

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants