Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve Full-Texts from Sinequa Dev Servers #1071

Closed
1 task done
CarsonDavis opened this issue Oct 15, 2024 · 1 comment · Fixed by #1077
Closed
1 task done

Retrieve Full-Texts from Sinequa Dev Servers #1071

CarsonDavis opened this issue Oct 15, 2024 · 1 comment · Fixed by #1077
Assignees

Comments

@CarsonDavis
Copy link
Collaborator

CarsonDavis commented Oct 15, 2024

Description

The existing url import code that brings urls into cosmos cannot support our downstream ML tasks. To do that, we need full texts. Afaik, full texts cannot be retrieved via the query endpoint, only the sql endpoint.

Work was started previously on:

However, this card was too broad, and we are breaking it into smaller chunks.

Implementation Considerations

  • use the engine.sql endpoint to get all existing metadata from the dev servers
  • use the engine.sql endpoint to get full_texts from the dev servers
  • store the incoming full_text in a new CandidateURL field called full_text
  • how will we do error handling?
  • Tests: In order to really test the important bits, we would need to emulate a sinequa server, which we are not going to do. Therefore, it is probably not worth it to make any tests right now.
  • We should be using tokens, similar to config_generation/minimum_api.py. The actual code will referecnce an environment variable. The token will be put into this file on local, and onto the server when we deploy. Sorry, it goes in .django local file.

Open Questions

  • Credentials: for local development, we will use Li's server
  • once it goes into staging, it should use existing environment variable?

Deliverable

  • dropdown menu
  • updated import script
  • new field
  • data migration

Tasks

  1. shravan-lrm
saifrk pushed a commit that referenced this issue Oct 24, 2024
@saifrk saifrk linked a pull request Oct 24, 2024 that will close this issue
saifrk pushed a commit that referenced this issue Oct 24, 2024
@saifrk
Copy link
Collaborator

saifrk commented Oct 24, 2024

Please refer to the attached mind map for a visual representation of the changes incorporated in this task.
1071

saifrk pushed a commit that referenced this issue Nov 8, 2024
saifrk pushed a commit that referenced this issue Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants