Multinode-HA Vespa Setup for Local Testing #1071

vicilliar · 2024-12-16T17:33:42Z

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Testing improvement
What is the current behavior? (You can also link to an open issue here)
Current vespa setup only uses a single node.
What is the new behavior (if this is a feature change)?
We implement a multinode setup for local vespa, so we can simulate cloud shards and replicas.
vespa_local.py start function now accepts --Shards and --Replicas as parameters. If Shards > 1 or Replicas > 0, multinode vespa setup is used. Multinode vespa setup has 3 config server nodes, max(2, total_content_nodes / 4) API nodes, and shards * (1 + replicas) content nodes.
Unit test github workflow now accepts shards and replicas as parameters.
Orchestrator workflow was created, which runs 4 unit tests setups:
(1) 0 replicas, 1 shard
(2) 1 replica, 1 shard
(3) 0 replicas, 2 shards
(4) 1 replica, 2 shards

Unit tests on multinode vespa will ignore the following directories: tests/core/inference, tests/processing, tests/s2_inference

Multinode vespa tests will use m6i.2xlarge instead of m6i.xlarge due to the higher memory usage from many vespa nodes. Config and API nodes are ~1gb and content nodes are ~500mb. A 9 node system (3 config, 2 API, 4 content) needs roughly 7gb for vespa alone.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No
Have unit tests been run against this PR? (Has there also been any additional testing?)
In progress
Related Python client changes (link commit/PR here)
Related documentation changes (link commit/PR here)
Other information:
Please check if the PR fulfills these requirements

The commit message follows our guidelines
Tests for the changes have been added (for bug fixes/features)
Docs have been added / updated (for bug fixes / features)

papa99do

Look great. Thanks for adding the unit test to make sure the compose file and configs are generated correctly.

papa99do · 2025-01-30T22:53:14Z

.github/workflows/unit_test_200gb_CI.yml

  cancel-in-progress: true

 permissions:
  contents: read

 jobs:
+  Determine-Vespa-Setup:


This step should be run after Check-Changes, and should be run only if check-changes returns true:
if: ${{ needs.Check-Changes.outputs.doc_only == 'false' }} # Run only if there are non-documentation changes

papa99do · 2025-01-30T22:59:41Z

.github/workflows/unit_test_200gb_CI.yml

@@ -224,7 +282,7 @@ jobs:
          cd marqo
          export PYTHONPATH="./tests:./src:."
          set -o pipefail
-          pytest --ignore=tests/test_documentation.py --ignore=tests/compatibility_tests \
+          pytest ${{ env.MULTINODE_TEST_ARGS }} --ignore=tests/test_documentation.py --ignore=tests/compatibility_tests \


It seems MULTINODE_TEST_ARGS is not passed in correctly (or maybe is not populated correctly in the first place?)

Also, in the next line, we fail the build if --cov-fail-under=69, which does not make sense for these tests since they skip a lot of test cases. we should skip the coverage check in multi-shard/replica tests.

It is passed in correctly for multinode runs. Please check this 2 shard 1 replica run: https://github.com/marqo-ai/marqo/actions/runs/13106217962/job/36561470973#step:9:15

MULTINODE_TEST_ARGS will be empty string for 1 shard and 0 replicas. Maybe that's the one you saw.

papa99do and others added 15 commits November 7, 2024 17:42

support multiple-group content clusters

37074d3

add multinode vespa workflow

d14c15a

add anchor workflow

22327df

remove yaml anchor and add input params

9ef33c0

add multinode vespa to vespa_local

10a5182

Merge branch 'mainline' into joshua/multi-shard-replica-vespa

9af624c

remove excess directory, create multinode dir

7ef0df3

add zookeeper port to vespa

7f7f7fe

ignore encoding tests in multinode vespa, add orchestrator

32788d5

ignore multinode generated files

aaab2ea

3 config nodes, separate api nodes in multinode

ecf8f95

add healthcheck to docker compose, fix ports

5567d8d

add healthcheck for all nodes, remove quotes in tests to ignore

95af9bf

remove comment, fix gitignore

609dbc4

update current attempt counter

568db72

vicilliar had a problem deploying to marqo-test-suite December 16, 2024 17:35 — with GitHub Actions Failure

vicilliar had a problem deploying to marqo-test-suite December 16, 2024 17:35 — with GitHub Actions Error

vicilliar had a problem deploying to marqo-test-suite December 16, 2024 17:36 — with GitHub Actions Failure

Merge branch 'mainline' into joshua/multi-shard-replica-vespa

6a54d7d

vicilliar temporarily deployed to marqo-test-suite December 16, 2024 17:46 — with GitHub Actions Inactive

vicilliar had a problem deploying to marqo-test-suite December 16, 2024 17:46 — with GitHub Actions Failure

vicilliar temporarily deployed to marqo-test-suite December 16, 2024 17:46 — with GitHub Actions Inactive

add default number of replicas and shards

cb28827

vicilliar temporarily deployed to marqo-test-suite December 17, 2024 06:17 — with GitHub Actions Inactive

vicilliar temporarily deployed to marqo-test-suite December 17, 2024 06:18 — with GitHub Actions Inactive

vicilliar temporarily deployed to marqo-test-suite December 17, 2024 06:19 — with GitHub Actions Inactive

ignore index management and monitoring tests

204a07d

vicilliar temporarily deployed to marqo-test-suite December 18, 2024 03:30 — with GitHub Actions Inactive

vicilliar had a problem deploying to marqo-test-suite December 18, 2024 03:31 — with GitHub Actions Failure

Merge branch 'mainline' into joshua/multi-shard-replica-vespa

a8cf14a

vicilliar temporarily deployed to marqo-test-suite January 28, 2025 04:46 — with GitHub Actions Inactive

vicilliar had a problem deploying to marqo-test-suite January 28, 2025 04:46 — with GitHub Actions Failure

vicilliar temporarily deployed to marqo-test-suite January 28, 2025 04:46 — with GitHub Actions Inactive

vicilliar temporarily deployed to marqo-test-suite January 28, 2025 05:59 — with GitHub Actions Inactive

papa99do previously approved these changes Jan 30, 2025

View reviewed changes

Merge branch 'mainline' into joshua/multi-shard-replica-vespa

d8cd756

vicilliar had a problem deploying to marqo-test-suite January 30, 2025 08:36 — with GitHub Actions Error

vicilliar had a problem deploying to marqo-test-suite January 30, 2025 08:37 — with GitHub Actions Error

add pull_request_review to condition check

05f356e

vicilliar dismissed papa99do’s stale review via 05f356e January 30, 2025 09:01

vicilliar had a problem deploying to marqo-test-suite January 30, 2025 09:03 — with GitHub Actions Error

vicilliar had a problem deploying to marqo-test-suite January 30, 2025 09:04 — with GitHub Actions Error

add space before closing bracket

b0c9546

vicilliar temporarily deployed to marqo-test-suite January 30, 2025 09:14 — with GitHub Actions Inactive

papa99do approved these changes Jan 30, 2025

View reviewed changes

papa99do temporarily deployed to marqo-test-suite January 30, 2025 09:48 — with GitHub Actions Inactive

papa99do had a problem deploying to marqo-test-suite January 30, 2025 09:48 — with GitHub Actions Failure

papa99do requested changes Jan 30, 2025

View reviewed changes

vicilliar had a problem deploying to marqo-test-suite February 3, 2025 04:12 — with GitHub Actions Failure

check changes before determine vespa setup

86b4868

vicilliar temporarily deployed to marqo-test-suite February 3, 2025 05:29 — with GitHub Actions Inactive

vicilliar temporarily deployed to marqo-test-suite February 3, 2025 05:30 — with GitHub Actions Inactive

vicilliar had a problem deploying to marqo-test-suite February 3, 2025 05:30 — with GitHub Actions Failure

only coverage check if single node

37afe2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multinode-HA Vespa Setup for Local Testing #1071

Multinode-HA Vespa Setup for Local Testing #1071

vicilliar commented Dec 16, 2024 •

edited

Loading

papa99do left a comment

papa99do Jan 30, 2025

papa99do Jan 30, 2025 •

edited

Loading

vicilliar Feb 3, 2025

Multinode-HA Vespa Setup for Local Testing #1071

Are you sure you want to change the base?

Multinode-HA Vespa Setup for Local Testing #1071

Conversation

vicilliar commented Dec 16, 2024 • edited Loading

papa99do left a comment

Choose a reason for hiding this comment

papa99do Jan 30, 2025

Choose a reason for hiding this comment

papa99do Jan 30, 2025 • edited Loading

Choose a reason for hiding this comment

vicilliar Feb 3, 2025

Choose a reason for hiding this comment

vicilliar commented Dec 16, 2024 •

edited

Loading

papa99do Jan 30, 2025 •

edited

Loading