-
Notifications
You must be signed in to change notification settings - Fork 679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mass Import Files? #239
Comments
Good point. There is currently no feature for mass importing files, but we'll add it to the feature list. |
@thomashacker |
We're currently implementing mass importing files in the upcoming v2 version, which should be released in a couple of weeks. If you need the functionality now, you can add it yourself, the source code of the frontend and backend are all available here 😄 |
Implemented the mass import functionality in the newest release |
Where is this implemented? I see no documentation for it. I found one backend endpoint chunks = self.batches[fileID]["chunks"]
data = "".join([chunks[chunk] for chunk in chunks]) So I assume the chunk value for DataBatchPayload needs to be a FileConfig? If so, why is it defined only as string and not a FileConfig? Some documentation would be nice. Maybe this isn't even the intended functionality. Generally it is good practice to mention the fixed issue in the commit of where it is being resolved. I am also confused why there is no more technical documentation in the repository? The hyperlink still exists in the README but it points to nothing and the technical markdown file has been deleted with no replacement. |
Good point! We added mass importing file functionality via the frontend, the FastAPI endpoints are currently only optimized to communicate with the frontend. Can you share with me more information on what functionality you need? We're working on a user API to make is easier to use programmatically in the future. And I agree, we're currently reworking the technical documentation, will be re-added soon 🚀 |
I am doing a mass import now and get this message, I can work around it because not that many files, but thought i'd mention it anyway.
after every successfully processed pdf. These pdf files are pretty large though, most of the time consiting of 100-200 pages |
Description
Question/Discussion: What is the best way to mass-import many files? I need to import about 200.000 text files. Currently my only working solution would be to upload all the files in batches of size 500 into github folders. And then import these folders via the GitHub reader one by one manually, whenever the current import is completed. Is there an easier way to do this, possibly directly by sending the file bytes via an API endpoint?
Is this a bug or a feature?
Steps to Reproduce
[see above]
Additional context
[None]
The text was updated successfully, but these errors were encountered: