Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update COSMOS Config Gen for Version and ID #1046

Open
bishwaspraveen opened this issue Oct 2, 2024 · 0 comments · May be fixed by #1072
Open

Update COSMOS Config Gen for Version and ID #1046

bishwaspraveen opened this issue Oct 2, 2024 · 0 comments · May be fixed by #1072
Assignees

Comments

@bishwaspraveen
Copy link
Contributor

bishwaspraveen commented Oct 2, 2024

Description

Traditionally, the ID and version fields for documents in Sinequa scraper generation within COSMOS worked based of the URL field. Going forward, we want to track versions of the documents in a more practical manner, given that re-indexation now needs to be set in place. We now want to maintain the versions of documents based on the scraped full text itself and this mapping needs to be changed during the config generation phase within COSMOS.

Implementation Considerations

Deliverable

Transition the ID and version mapping during config generation within COSMOS to use full text instead of URLs.

Dependencies

depends on https://github.com/NASA-IMPACT/sde-backend/issues/744

@bishwaspraveen bishwaspraveen self-assigned this Oct 2, 2024
@CarsonDavis CarsonDavis changed the title Change config generation in COSMOS to have full text based version and ID fields Update COSMOS Config Gen for Version and ID Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants