-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multinode-HA Vespa Setup for Local Testing #1071
base: mainline
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look great. Thanks for adding the unit test to make sure the compose file and configs are generated correctly.
cancel-in-progress: true | ||
|
||
permissions: | ||
contents: read | ||
|
||
jobs: | ||
Determine-Vespa-Setup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This step should be run after Check-Changes
, and should be run only if check-changes returns true:
if: ${{ needs.Check-Changes.outputs.doc_only == 'false' }} # Run only if there are non-documentation changes
@@ -224,7 +282,7 @@ jobs: | |||
cd marqo | |||
export PYTHONPATH="./tests:./src:." | |||
set -o pipefail | |||
pytest --ignore=tests/test_documentation.py --ignore=tests/compatibility_tests \ | |||
pytest ${{ env.MULTINODE_TEST_ARGS }} --ignore=tests/test_documentation.py --ignore=tests/compatibility_tests \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems MULTINODE_TEST_ARGS is not passed in correctly (or maybe is not populated correctly in the first place?)
Also, in the next line, we fail the build if --cov-fail-under=69
, which does not make sense for these tests since they skip a lot of test cases. we should skip the coverage check in multi-shard/replica tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is passed in correctly for multinode runs. Please check this 2 shard 1 replica run: https://github.com/marqo-ai/marqo/actions/runs/13106217962/job/36561470973#step:9:15
MULTINODE_TEST_ARGS will be empty string for 1 shard and 0 replicas. Maybe that's the one you saw.
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Testing improvement
What is the current behavior? (You can also link to an open issue here)
Current vespa setup only uses a single node.
What is the new behavior (if this is a feature change)?
We implement a multinode setup for local vespa, so we can simulate cloud shards and replicas.
vespa_local.py start
function now accepts--Shards
and--Replicas
as parameters. If Shards > 1 or Replicas > 0, multinode vespa setup is used. Multinode vespa setup has 3 config server nodes, max(2, total_content_nodes / 4) API nodes, and shards * (1 + replicas) content nodes.Unit test github workflow now accepts shards and replicas as parameters.
Orchestrator workflow was created, which runs 4 unit tests setups:
(1) 0 replicas, 1 shard
(2) 1 replica, 1 shard
(3) 0 replicas, 2 shards
(4) 1 replica, 2 shards
Unit tests on multinode vespa will ignore the following directories: tests/core/inference, tests/processing, tests/s2_inference
Multinode vespa tests will use m6i.2xlarge instead of m6i.xlarge due to the higher memory usage from many vespa nodes. Config and API nodes are ~1gb and content nodes are ~500mb. A 9 node system (3 config, 2 API, 4 content) needs roughly 7gb for vespa alone.
Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
No
Have unit tests been run against this PR? (Has there also been any additional testing?)
In progress
Related Python client changes (link commit/PR here)
Related documentation changes (link commit/PR here)
Other information:
Please check if the PR fulfills these requirements