-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract information from bytes #300
Comments
What do you mean with "downloaded, but not saved as a file yet"? Textract requires that you specify the path to the pdf file. So far I have only parsed files that have been saved locally. You might try some of the ideas here, but I don't completly understand what you're trying to do. |
I get the PDFs from a HTTP response. So, with the body (as bytes) I should be able to extract the pdf from the bytes alone, I do not think it's necessary to save the PDF as a file, to then parse it to extract the text to then delete the created file; when it was already in memory as a Python variable. |
any progress in byte stream ( file.read() ) or you can suggest any other way out ? |
|
That's the solution. Works like a charm and works in the cloud in a stateless function without any filesystem access! |
I have a PDF that I have downloaded, so is not saved as a file yet. How can I use textract to extract the text without actually saving the file?
The text was updated successfully, but these errors were encountered: