Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PDF upload to BlindChat #14

Open
dhuynh95 opened this issue Sep 22, 2023 · 1 comment
Open

Add PDF upload to BlindChat #14

dhuynh95 opened this issue Sep 22, 2023 · 1 comment
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@dhuynh95
Copy link
Contributor

dhuynh95 commented Sep 22, 2023

Add a button on the Chat UI to load PDFs on the client side for future RAG integration

@dhuynh95 dhuynh95 added enhancement New feature or request help wanted Extra attention is needed labels Sep 22, 2023
@dhuynh95 dhuynh95 added this to the Local-only milestone Sep 22, 2023
@aquilu
Copy link

aquilu commented Oct 3, 2023

Action Plan for Integrating PDF Processing in BlindChat

1. Text Extraction from PDF

  • Use a client-side library to extract text from PDF files. This ensures that the PDF content never leaves the user's device.
  • JavaScript-based libraries like pdf.js are recommended for this task.

2. Vectorization and Embedding Creation

  • Once the text has been extracted, use a natural language processing model to convert the text into embeddings.
  • For this, one can use TensorFlow.js and pretrained models available on the platform, or consider using specific libraries that offer embeddings in the browser.
  • It's crucial to consider model efficiency and size to ensure a good user experience.

3. Integration with BlindChat

  • Add an option in BlindChat for users to upload PDF files.
  • Process the PDF and convert the extracted text into embeddings.
  • Use these embeddings based on BlindChat functionalities. For example, for searching relevant information, comparing embeddings, etc.

4. Privacy and Security Considerations

  • Ensure that all processing is done on the client-side to maintain privacy.
  • Do not permanently store embeddings or text in the browser to ensure there are no security risks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Status: Todo
Development

No branches or pull requests

3 participants