Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: Show current pipeline run issues (DO NOT MERGE) #8695

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mathislucka
Copy link
Member

Related Issues

Proposed Changes:

This PR's purpose is to highlight and discuss current issues with the Pipeline.run logic.
It adds behavioral tests that display different failures for the method (some of these have the same underlying reason).
This is a first step towards resolving these issues and getting to a robust execution logic for our pipelines.

Findings

  1. Insertion order of components and connections influences component execution order, number of times a component is executed and pipeline outputs
  2. Components with only optional inputs run in cases where they did not receive any input from connected components and already ran

(3. at least locally, some tests fail for different reasons with no changes to the code)

How did you test it?

  • behavioral tests

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

return {"noop": None}

pipeline = Pipeline(max_runs_per_component=1)
pipeline.add_component("third_creator", ConditionalDocumentCreator(content="Third document"))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a minor variation on the test above. The only change is swapping the insertion order of document creators. As a consequence, the order of documents in the DocumentJoiner's output will be swapped.

This behavior is consistent but AFAIK it is not documented anywhere and it is very unexpected for a user to have the output of their pipeline change, just because they changed the insertion order of their components.

"joiner",
"agent_llm",
"router",
"answer_builder"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, rag_prompt and rag_llm will run again although they did not receive any input from the cycle during this iteration. In turn, a PipelineMaxComponentRuns exception is raised because they are attempting to run a third time. The expected behavior with a proper run logic would be 2 iterations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a similar issue described in my other comment and appears to function as intended when a required_variable is assigned during the rag_prompt initialization, as shown below:
rag_prompt = PromptBuilder(template=rag_prompt_template, required_variables=["query", "documents"])

},
expected_run_order=[
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'concatenator',
'code_prompt', 'code_llm', 'feedback_prompt', 'feedback_llm', 'router', 'answer_builder'],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very similar to the one above but without user input to the second PromptBuilder. The test will fail because feedback_prompt and feedback_llm are executed a third time. feedback_prompt does not receive any inputs but still executes.

inputs={"code_prompt": {"task": task}, "answer_builder": {"query": task}},
expected_outputs={
"answer_builder": {
"answers": [GeneratedAnswer(data="valid code", query=task, documents=[])]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very similar to the one above but we changed the order of connections.

    + pp.connect("concatenator.output", "code_prompt.feedback")
    pp.connect("code_prompt.prompt", "code_llm.prompt")
    pp.connect("code_llm.replies", "feedback_prompt.code")
    pp.connect("feedback_llm.replies", "router.replies")
    pp.connect("router.fail", "concatenator.feedback")
    pp.connect("feedback_prompt.prompt", "feedback_llm.prompt")
    pp.connect("router.pass", "answer_builder.replies")
    pp.connect("code_llm.replies", "router.code")
    pp.connect("code_llm.replies", "concatenator.current_prompt")
    - pp.connect("concatenator.output", "code_prompt.feedback")

As a result, at least when I run the tests locally, the failure reason becomes indeterministic. For some test runs the expected run order differs from the actual run order and for some test runs, the output is an Answer with "invalid code" instead of "valid code".

Looking into both our code and networkx, I don't understand how these different outcomes can occur.

Copy link
Contributor

@Amnah199 Amnah199 Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to debug this for my own understanding and I believe the issue is caused due to PromptBuilders being executed out of order.
It can be resolved by initializing PromptBuilders as below:

code_prompt = PromptBuilder(template=code_prompt_template, required_variables=["task"])
feedback_prompt = PromptBuilder(template=feedback_prompt_template, required_variables=["code"])

The root cause lies in the absence of required_variables. Without them, the edge connecting code_llm → feedback_prompt is assigned mandatory=False. Consequently, this connection is removed while creating suubgraphs in _break_supported_cycles_in_graph, as it is deemed non-mandatory. This leaves feedback_prompt without inputs in the subgraph, disrupting the topological order in which the graph's nodes are sorted by networkx.

Later, in the run method, components without input edges are assigned a default input to ensure they run (refer to code here). As a result, feedback_prompt ends up in the waiting_queue with a default input of {template=None} and runs independently. A similar scenario occurs with code_prompt, which can randomly affect the output, making it somewhat non-deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the way waiting_queue is currently handled could be causing loops for pipelines with cycles.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what's happening.

The first issue is here:

cycles: List[List[str]] = list(networkx.simple_cycles(self.graph))

simple_cycles returns the nodes in the cycles in a different order for some invocations (it's non-deterministic).

Downstream, when we break edges here, the different order affects which edge is broken. This leads to different execution orders for any cycle that has more than one optional or greedy variadic edge.

Here:

We are not resetting the waiting queue which means that components that were put into the waiting queue before the cycle executed will be executed again if they only have default inputs. This will re-trigger the cycle execution.

)

@given("a pipeline that has an agent with a feedback cycle", target_fixture="pipeline_data")
def agent_with_feedback_cycle():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test intends to replicate: #8657

However, the runtime behavior of the test is not deterministic. It sometimes fails with the expected error:
AttributeError("'NoneType' object has no attribute 'keys'") but sometimes it raises ValueError('BranchJoiner expects only one input, but 0 were received.') instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides remaining test cases, this error seems to be a legit bug in the logic.

@mathislucka mathislucka marked this pull request as ready for review January 9, 2025 14:49
@mathislucka mathislucka requested a review from a team as a code owner January 9, 2025 14:49
@mathislucka mathislucka requested review from vblagoje and removed request for a team January 9, 2025 14:49
@julian-risch julian-risch requested review from Amnah199 and julian-risch and removed request for vblagoje January 9, 2025 14:49
@mathislucka
Copy link
Member Author

Confirmed point 3 in the CI. Both tests yield different results on different runs.

See here for an example:
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35379360449#step:8:605
https://github.com/deepset-ai/haystack/actions/runs/12692547619/job/35378636775#step:8:590

@coveralls
Copy link
Collaborator

coveralls commented Jan 9, 2025

Pull Request Test Coverage Report for Build 12693854002

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 9 unchanged lines in 3 files lost coverage.
  • Overall coverage remained the same at 91.094%

Files with Coverage Reduction New Missed Lines %
components/converters/openapi_functions.py 1 78.63%
marshal/yaml.py 1 95.24%
document_stores/in_memory/document_store.py 7 96.03%
Totals Coverage Status
Change from base Build 12689025274: 0.0%
Covered Lines: 8653
Relevant Lines: 9499

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants