New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add PDF upload to BlindChat #14

Open

dhuynh95 opened this issue Sep 22, 2023 · 1 comment

Labels

enhancement help wanted

Milestone

BlindChat Local

Contributor

dhuynh95 commented Sep 22, 2023 •

edited

Loading

Add a button on the Chat UI to load PDFs on the client side for future RAG integration

dhuynh95 added enhancement help wanted labels

dhuynh95 added this to the Local-only milestone

lyie28 added BlindChat and removed BlindChat labels

aquilu commented Oct 3, 2023

Action Plan for Integrating PDF Processing in BlindChat

1. Text Extraction from PDF

Use a client-side library to extract text from PDF files. This ensures that the PDF content never leaves the user's device.
JavaScript-based libraries like pdf.js are recommended for this task.

2. Vectorization and Embedding Creation

Once the text has been extracted, use a natural language processing model to convert the text into embeddings.
For this, one can use TensorFlow.js and pretrained models available on the platform, or consider using specific libraries that offer embeddings in the browser.
It's crucial to consider model efficiency and size to ensure a good user experience.

3. Integration with BlindChat

Add an option in BlindChat for users to upload PDF files.
Process the PDF and convert the extracted text into embeddings.
Use these embeddings based on BlindChat functionalities. For example, for searching relevant information, comparing embeddings, etc.

4. Privacy and Security Considerations

Ensure that all processing is done on the client-side to maintain privacy.
Do not permanently store embeddings or text in the browser to ensure there are no security risks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment