-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test: Show current pipeline run issues (DO NOT MERGE) #8695
base: main
Are you sure you want to change the base?
Conversation
return {"noop": None} | ||
|
||
pipeline = Pipeline(max_runs_per_component=1) | ||
pipeline.add_component("third_creator", ConditionalDocumentCreator(content="Third document")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a minor variation on the test above. The only change is swapping the insertion order of document creators. As a consequence, the order of documents in the DocumentJoiner
's output will be swapped.
This behavior is consistent but AFAIK it is not documented anywhere and it is very unexpected for a user to have the output of their pipeline change, just because they changed the insertion order of their components.
"joiner", | ||
"agent_llm", | ||
"router", | ||
"answer_builder" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this test, rag_prompt
and rag_llm
will run again although they did not receive any input from the cycle during this iteration. In turn, a PipelineMaxComponentRuns
exception is raised because they are attempting to run a third time. The expected behavior with a proper run logic would be 2 iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has a similar issue described in my other comment and appears to function as intended when a required_variable
is assigned during the rag_prompt
initialization, as shown below:
rag_prompt = PromptBuilder(template=rag_prompt_template, required_variables=["query", "documents"])
}, | ||
expected_run_order=[ | ||
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'concatenator', | ||
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'answer_builder'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is very similar to the one above but without user input to the second PromptBuilder
. The test will fail because feedback_prompt
and feedback_llm
are executed a third time. feedback_prompt
does not receive any inputs but still executes.
inputs={"code_prompt": {"task": task}, "answer_builder": {"query": task}}, | ||
expected_outputs={ | ||
"answer_builder": { | ||
"answers": [GeneratedAnswer(data="valid code", query=task, documents=[])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is very similar to the one above but we changed the order of connections.
+ pp.connect("concatenator.output", "code_prompt.feedback")
pp.connect("code_prompt.prompt", "code_llm.prompt")
pp.connect("code_llm.replies", "feedback_prompt.code")
pp.connect("feedback_llm.replies", "router.replies")
pp.connect("router.fail", "concatenator.feedback")
pp.connect("feedback_prompt.prompt", "feedback_llm.prompt")
pp.connect("router.pass", "answer_builder.replies")
pp.connect("code_llm.replies", "router.code")
pp.connect("code_llm.replies", "concatenator.current_prompt")
- pp.connect("concatenator.output", "code_prompt.feedback")
As a result, at least when I run the tests locally, the failure reason becomes indeterministic. For some test runs the expected run order differs from the actual run order and for some test runs, the output is an Answer with "invalid code" instead of "valid code".
Looking into both our code and networkx, I don't understand how these different outcomes can occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to debug this for my own understanding and I believe the issue is caused due to PromptBuilders
being executed out of order.
It can be resolved by initializing PromptBuilders
as below:
code_prompt = PromptBuilder(template=code_prompt_template, required_variables=["task"])
feedback_prompt = PromptBuilder(template=feedback_prompt_template, required_variables=["code"])
The root cause lies in the absence of required_variables
. Without them, the edge connecting code_llm → feedback_prompt
is assigned mandatory=False
. Consequently, this connection is removed while creating suubgraphs in _break_supported_cycles_in_graph
, as it is deemed non-mandatory. This leaves feedback_prompt
without inputs in the subgraph, disrupting the topological order in which the graph's nodes are sorted by networkx.
Later, in the run method, components without input edges are assigned a default input to ensure they run (refer to code here). As a result, feedback_prompt
ends up in the waiting_queue
with a default input of {template=None}
and runs independently. A similar scenario occurs with code_prompt
, which can randomly affect the output, making it somewhat non-deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the way waiting_queue
is currently handled could be causing loops for pipelines with cycles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly what's happening.
The first issue is here:
haystack/haystack/core/pipeline/base.py
Line 1187 in 167ede1
cycles: List[List[str]] = list(networkx.simple_cycles(self.graph)) |
simple_cycles
returns the nodes in the cycles in a different order for some invocations (it's non-deterministic).
Downstream, when we break edges here, the different order affects which edge is broken. This leads to different execution orders for any cycle that has more than one optional or greedy variadic edge.
Here:
haystack/haystack/core/pipeline/pipeline.py
Line 450 in 167ede1
run_queue = [] |
We are not resetting the waiting queue which means that components that were put into the waiting queue before the cycle executed will be executed again if they only have default inputs. This will re-trigger the cycle execution.
) | ||
|
||
@given("a pipeline that has an agent with a feedback cycle", target_fixture="pipeline_data") | ||
def agent_with_feedback_cycle(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test intends to replicate: #8657
However, the runtime behavior of the test is not deterministic. It sometimes fails with the expected error:
AttributeError("'NoneType' object has no attribute 'keys'")
but sometimes it raises ValueError('BranchJoiner expects only one input, but 0 were received.')
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides remaining test cases, this error seems to be a legit bug in the logic.
Confirmed point 3 in the CI. Both tests yield different results on different runs. See here for an example: |
Pull Request Test Coverage Report for Build 12693854002Details
💛 - Coveralls |
Related Issues
Proposed Changes:
This PR's purpose is to highlight and discuss current issues with the
Pipeline.run
logic.It adds behavioral tests that display different failures for the method (some of these have the same underlying reason).
This is a first step towards resolving these issues and getting to a robust execution logic for our pipelines.
Findings
(3. at least locally, some tests fail for different reasons with no changes to the code)
How did you test it?
Notes for the reviewer
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.