Test: Show current pipeline run issues (DO NOT MERGE) #8695

mathislucka · 2025-01-09T13:58:37Z

Related Issues

Proposed Changes:

This PR's purpose is to highlight and discuss current issues with the Pipeline.run logic.
It adds behavioral tests that display different failures for the method (some of these have the same underlying reason).
This is a first step towards resolving these issues and getting to a robust execution logic for our pipelines.

Findings

Insertion order of components and connections influences component execution order, number of times a component is executed and pipeline outputs
Components with only optional inputs run in cases where they did not receive any input from connected components and already ran

(3. at least locally, some tests fail for different reasons with no changes to the code)

How did you test it?

behavioral tests

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I documented my code
I ran pre-commit hooks and fixed any issue

mathislucka · 2025-01-09T14:02:43Z

test/core/pipeline/features/test_run.py

+            return {"noop": None}
+
+    pipeline = Pipeline(max_runs_per_component=1)
+    pipeline.add_component("third_creator", ConditionalDocumentCreator(content="Third document"))


This test is a minor variation on the test above. The only change is swapping the insertion order of document creators. As a consequence, the order of documents in the DocumentJoiner's output will be swapped.

This behavior is consistent but AFAIK it is not documented anywhere and it is very unexpected for a user to have the output of their pipeline change, just because they changed the insertion order of their components.

mathislucka · 2025-01-09T14:05:57Z

test/core/pipeline/features/test_run.py

+                    "joiner",
+                    "agent_llm",
+                    "router",
+                    "answer_builder"


In this test, rag_prompt and rag_llm will run again although they did not receive any input from the cycle during this iteration. In turn, a PipelineMaxComponentRuns exception is raised because they are attempting to run a third time. The expected behavior with a proper run logic would be 2 iterations.

This has a similar issue described in my other comment and appears to function as intended when a required_variable is assigned during the rag_prompt initialization, as shown below:
rag_prompt = PromptBuilder(template=rag_prompt_template, required_variables=["query", "documents"])

mathislucka · 2025-01-09T14:08:13Z

test/core/pipeline/features/test_run.py

+                },
+                expected_run_order=[
+                    'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'concatenator',
+                    'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'answer_builder'],


This test is very similar to the one above but without user input to the second PromptBuilder. The test will fail because feedback_prompt and feedback_llm are executed a third time. feedback_prompt does not receive any inputs but still executes.

mathislucka · 2025-01-09T14:12:36Z

test/core/pipeline/features/test_run.py

+                inputs={"code_prompt": {"task": task}, "answer_builder": {"query": task}},
+                expected_outputs={
+                    "answer_builder": {
+                        "answers": [GeneratedAnswer(data="valid code", query=task, documents=[])]


This test is very similar to the one above but we changed the order of connections.

+ pp.connect("concatenator.output", "code_prompt.feedback") pp.connect("code_prompt.prompt", "code_llm.prompt") pp.connect("code_llm.replies", "feedback_prompt.code") pp.connect("feedback_llm.replies", "router.replies") pp.connect("router.fail", "concatenator.feedback") pp.connect("feedback_prompt.prompt", "feedback_llm.prompt") pp.connect("router.pass", "answer_builder.replies") pp.connect("code_llm.replies", "router.code") pp.connect("code_llm.replies", "concatenator.current_prompt") - pp.connect("concatenator.output", "code_prompt.feedback")

As a result, at least when I run the tests locally, the failure reason becomes indeterministic. For some test runs the expected run order differs from the actual run order and for some test runs, the output is an Answer with "invalid code" instead of "valid code".

Looking into both our code and networkx, I don't understand how these different outcomes can occur.

I tried to debug this for my own understanding and I believe the issue is caused due to PromptBuilders being executed out of order.
It can be resolved by initializing PromptBuilders as below:

code_prompt = PromptBuilder(template=code_prompt_template, required_variables=["task"]) feedback_prompt = PromptBuilder(template=feedback_prompt_template, required_variables=["code"])

The root cause lies in the absence of required_variables. Without them, the edge connecting code_llm → feedback_prompt is assigned mandatory=False. Consequently, this connection is removed while creating suubgraphs in _break_supported_cycles_in_graph, as it is deemed non-mandatory. This leaves feedback_prompt without inputs in the subgraph, disrupting the topological order in which the graph's nodes are sorted by networkx.

Later, in the run method, components without input edges are assigned a default input to ensure they run (refer to code here). As a result, feedback_prompt ends up in the waiting_queue with a default input of {template=None} and runs independently. A similar scenario occurs with code_prompt, which can randomly affect the output, making it somewhat non-deterministic.

Also the way waiting_queue is currently handled could be causing loops for pipelines with cycles.

Yes, that's exactly what's happening.

The first issue is here:

haystack/haystack/core/pipeline/base.py

Line 1187 in 167ede1

cycles: List[List[str]] = list(networkx.simple_cycles(self.graph))

simple_cycles returns the nodes in the cycles in a different order for some invocations (it's non-deterministic).

Downstream, when we break edges here, the different order affects which edge is broken. This leads to different execution orders for any cycle that has more than one optional or greedy variadic edge.

Here:

haystack/haystack/core/pipeline/pipeline.py

Line 450 in 167ede1

run_queue = []

We are not resetting the waiting queue which means that components that were put into the waiting queue before the cycle executed will be executed again if they only have default inputs. This will re-trigger the cycle execution.

mathislucka · 2025-01-09T14:18:19Z

test/core/pipeline/features/test_run.py

+    )
+
+@given("a pipeline that has an agent with a feedback cycle", target_fixture="pipeline_data")
+def agent_with_feedback_cycle():


This test intends to replicate: #8657

However, the runtime behavior of the test is not deterministic. It sometimes fails with the expected error:
AttributeError("'NoneType' object has no attribute 'keys'") but sometimes it raises ValueError('BranchJoiner expects only one input, but 0 were received.') instead.

Besides remaining test cases, this error seems to be a legit bug in the logic.

mathislucka · 2025-01-09T17:02:02Z

Confirmed point 3 in the CI. Both tests yield different results on different runs.

See here for an example:
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35379360449#step:8:605
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35378636775#step:8:590

coveralls · 2025-01-09T18:37:40Z

Pull Request Test Coverage Report for Build 12693854002

Details

0 of 0 changed or added relevant lines in 0 files are covered.
9 unchanged lines in 3 files lost coverage.
Overall coverage remained the same at 91.094%

Files with Coverage Reduction	New Missed Lines	%
components/converters/openapi_functions.py	1	78.63%
marshal/yaml.py	1	95.24%
document_stores/in_memory/document_store.py	7	96.03%

Totals
Change from base Build 12689025274:	0.0%
Covered Lines:	8653
Relevant Lines:	9499

💛 - Coveralls

chore: add pipeline run tests showing bugs

26394f1

github-actions bot added the topic:tests label Jan 9, 2025

mathislucka commented Jan 9, 2025

View reviewed changes

mathislucka marked this pull request as ready for review January 9, 2025 14:49

mathislucka requested a review from a team as a code owner January 9, 2025 14:49

mathislucka requested review from vblagoje and removed request for a team January 9, 2025 14:49

julian-risch requested review from Amnah199 and julian-risch and removed request for vblagoje January 9, 2025 14:49

mathislucka added 2 commits January 9, 2025 15:56

chore: format

ab03473

chore: format

438ffaa

github-actions bot added the topic:core label Jan 9, 2025

mathislucka added 2 commits January 9, 2025 17:04

chore: trigger CI

7461a39

chore: format

2f525a8

mathislucka mentioned this pull request Jan 11, 2025

Fix: Pipeline.run logic #8707

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test: Show current pipeline run issues (DO NOT MERGE) #8695

Test: Show current pipeline run issues (DO NOT MERGE) #8695

mathislucka commented Jan 9, 2025

mathislucka Jan 9, 2025

mathislucka Jan 9, 2025

Amnah199 Jan 14, 2025

mathislucka Jan 9, 2025

mathislucka Jan 9, 2025

Amnah199 Jan 14, 2025 •

edited

Loading

Amnah199 Jan 14, 2025

mathislucka Jan 15, 2025

mathislucka Jan 9, 2025

Amnah199 Jan 14, 2025

mathislucka commented Jan 9, 2025

coveralls commented Jan 9, 2025 •

edited

Loading

Test: Show current pipeline run issues (DO NOT MERGE) #8695

Are you sure you want to change the base?

Test: Show current pipeline run issues (DO NOT MERGE) #8695

Conversation

mathislucka commented Jan 9, 2025

Related Issues

Proposed Changes:

Findings

How did you test it?

Notes for the reviewer

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amnah199 Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mathislucka commented Jan 9, 2025

coveralls commented Jan 9, 2025 • edited Loading

Pull Request Test Coverage Report for Build 12693854002

Details

💛 - Coveralls

Amnah199 Jan 14, 2025 •

edited

Loading

coveralls commented Jan 9, 2025 •

edited

Loading