Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Wrong status of document in poisoned queue #987

Open
KurtP20 opened this issue Jan 24, 2025 · 1 comment
Open

[Bug] Wrong status of document in poisoned queue #987

KurtP20 opened this issue Jan 24, 2025 · 1 comment
Labels
bug Something isn't working triage

Comments

@KurtP20
Copy link

KurtP20 commented Jan 24, 2025

Context / Scenario

When ingesting an invalid URL, e.g. ImportWebPageAsync("http://malformed_url") KM places the document in the poisoned queue after some attempts: Microsoft.KernelMemory.Pipeline.Queue.DevTools.SimpleQueues[0] Message '20250124.114916.8130921.4d6c0b1c4b4d41ff84a0cb26ac27abe8' processing failed with exception, max attempts reached, moving to poison queue..
But the status reported by GetDocumentStatusAsync is still as it was before (my log message: Document 416A1AABBD2B38AE93197949C710199DC83695E497F514EFA5097173535AE492 null?:False completed:False empty:False remaining steps:extract, partition, gen_embeddings, save_records ready:False).

It would be nice to have an additional field failed in DataPipelineStatus, maybe even with a message-field why it failed. Since one most likely wants to delete the failed document, it would be nice to include an optional flag deleteUponFailure to ImportWebPageAsync (or the other Import* methods).

What happened?

Status reports URL is still ingesting, while it is in the poisoned queue.

Importance

a fix would make my life easier

Platform, Language, Versions

KernelMemory 0.95
kernelmemory/service created 2025-01-20T15:41:17.539712455Z
C# / .net9

Relevant log output

@KurtP20 KurtP20 added bug Something isn't working triage labels Jan 24, 2025
@KurtP20
Copy link
Author

KurtP20 commented Jan 24, 2025

A somewhat related observation: If a url containing a url-fragment (e.g. https://microsoft.github.io/kernel-memory/quickstart/start-service#check-openapi-swagger) is ingested, KM throws an error. Maybe you want to disregard url-fragements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

1 participant