Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci(framework:skip) Refactor SuperExec e2e test #4225

Merged
merged 39 commits into from
Sep 27, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
2cf4dc5
Init
chongshenng Sep 17, 2024
c3569a1
Init
chongshenng Sep 17, 2024
17d9d37
Init change to e2e
chongshenng Sep 17, 2024
afbd947
Change permissions, update name in CI
chongshenng Sep 17, 2024
8e2429b
Change double quotes to single quotes to preserve literal values
chongshenng Sep 17, 2024
a55de8c
Add optional extras when installing wheel from artifact store
chongshenng Sep 17, 2024
77284b3
Modify installation from Flower wheel
chongshenng Sep 17, 2024
b4cf87e
Conform to pypi wheel name convention
chongshenng Sep 17, 2024
6f132a5
Refactor CI
chongshenng Sep 17, 2024
61be6b9
Change to no-auth
chongshenng Sep 17, 2024
4a78dd7
Small fixes
chongshenng Sep 17, 2024
d369ead
Remove flwr from pyproject.toml
chongshenng Sep 18, 2024
27bb83c
Add new line
chongshenng Sep 18, 2024
71957a7
Test yaml concat
chongshenng Sep 18, 2024
5a721b7
Add Python matrix
chongshenng Sep 18, 2024
6d623b6
Minor fixes
chongshenng Sep 18, 2024
4ba33a4
Tweaks
chongshenng Sep 18, 2024
e247de0
Fix
chongshenng Sep 18, 2024
dadc572
Merge branch 'main' into refactor-superexec-e2e-test
chongshenng Sep 18, 2024
3323c79
Update name
chongshenng Sep 18, 2024
2d4ed8b
Skip node_id check for GetFabRequest
chongshenng Sep 18, 2024
752d31c
Add auth check
chongshenng Sep 18, 2024
58172e7
Intermediate
chongshenng Sep 18, 2024
7f3b5a9
Fix matrix strategy
chongshenng Sep 18, 2024
0456652
Expand e2e
chongshenng Sep 18, 2024
9401cc6
Full e2e
chongshenng Sep 18, 2024
bdf5228
Merge branch 'main' into refactor-superexec-e2e-test
chongshenng Sep 18, 2024
274ad93
Merge branch 'main' into refactor-superexec-e2e-test
chongshenng Sep 19, 2024
780dd38
Merge branch 'main' into refactor-superexec-e2e-test
chongshenng Sep 19, 2024
db3a633
Use bootstrap
chongshenng Sep 19, 2024
8f6d804
Revert patch
chongshenng Sep 19, 2024
5845e04
Merge branch 'main' into refactor-superexec-e2e-test
danieljanes Sep 19, 2024
9379bbe
Add fail slow
chongshenng Sep 19, 2024
405ed9c
feat(framework) Add minimal `Control` service (#4239)
panh99 Sep 19, 2024
c117591
feat(framework) Move run-related request/response to `run.proto` (#4240)
panh99 Sep 19, 2024
f346d20
Merge branch 'main' into refactor-superexec-e2e-test
chongshenng Sep 19, 2024
e4603fe
Merge branch 'main' into refactor-superexec-e2e-test
chongshenng Sep 27, 2024
bbb6582
Merge branch 'main' into refactor-superexec-e2e-test
chongshenng Sep 27, 2024
e49c3aa
Merge branch 'main' into refactor-superexec-e2e-test
danieljanes Sep 27, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions .github/workflows/e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,60 @@ jobs:
short_sha: ${{ steps.upload.outputs.SHORT_SHA }}
dir: ${{ steps.upload.outputs.DIR }}

superexec:
runs-on: ubuntu-22.04
timeout-minutes: 10
needs: wheel
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]
directory: [e2e-bare-auth]
connection: [secure, insecure]
engine: [deployment-engine, simulation-engine]
authentication: [no-auth, client-auth]
exclude:
- connection: insecure
authentication: client-auth
name: |
SuperExec /
Python ${{ matrix.python-version }} /
${{ matrix.connection }} /
${{ matrix.authentication }} /
${{ matrix.engine }}
defaults:
run:
working-directory: e2e/${{ matrix.directory }}
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install build tools
run: python -m pip install -U pip==23.3.1
shell: bash
- name: Download and install Flower wheel from artifact store
if: ${{ github.repository == 'adap/flower' && !github.event.pull_request.head.repo.fork && github.actor != 'dependabot[bot]' }}
run: |
# Define base URL for wheel file
WHEEL_URL="https://${{ env.ARTIFACT_BUCKET }}/py/${{ needs.wheel.outputs.dir }}/${{ needs.wheel.outputs.short_sha }}/${{ needs.wheel.outputs.whl_path }}"
# Download wheel file
wget "${WHEEL_URL}"
# Extract the original filename from the URL
WHEEL_FILE=$(basename "${WHEEL_URL}")
if [[ "${{ matrix.engine }}" == "simulation-engine" ]]; then
python -m pip install "${WHEEL_FILE}[simulation]"
else
python -m pip install "${WHEEL_FILE}"
fi
- name: >
Run SuperExec test /
${{ matrix.connection }} /
${{ matrix.authentication }} /
${{ matrix.engine }}
working-directory: e2e/${{ matrix.directory }}
run: ./../test_superexec.sh "${{ matrix.connection }}" "${{ matrix.authentication}}" "${{ matrix.engine }}"

frameworks:
runs-on: ubuntu-22.04
timeout-minutes: 10
Expand Down
1 change: 1 addition & 0 deletions e2e/e2e-bare-auth/certificate.conf
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ subjectAltName = @alt_names
DNS.1 = localhost
IP.1 = ::1
IP.2 = 127.0.0.1
IP.3 = 0.0.0.0
122 changes: 122 additions & 0 deletions e2e/test_superexec.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
#!/bin/bash
set -e

# Set connectivity parameters
case "$1" in
secure)
./generate.sh
server_arg='--ssl-ca-certfile ../certificates/ca.crt
--ssl-certfile ../certificates/server.pem
--ssl-keyfile ../certificates/server.key'
client_arg='--root-certificates ../certificates/ca.crt'
# For $superexec_arg, note special ordering of single- and double-quotes
superexec_arg='--executor-config 'root-certificates=\"../certificates/ca.crt\"''
superexec_arg="$server_arg $superexec_arg"
;;
insecure)
server_arg='--insecure'
client_arg=$server_arg
superexec_arg=$server_arg
;;
esac

# Set authentication parameters
case "$2" in
client-auth)
server_auth='--auth-list-public-keys ../keys/client_public_keys.csv
--auth-superlink-private-key ../keys/server_credentials
--auth-superlink-public-key ../keys/server_credentials.pub'
client_auth_1='--auth-supernode-private-key ../keys/client_credentials_1
--auth-supernode-public-key ../keys/client_credentials_1.pub'
client_auth_2='--auth-supernode-private-key ../keys/client_credentials_2
--auth-supernode-public-key ../keys/client_credentials_2.pub'
server_address='127.0.0.1:9092'
;;
*)
server_auth=''
client_auth_1=''
client_auth_2=''
server_address='127.0.0.1:9092'
;;
esac

# Set engine
case "$3" in
deployment-engine)
superexec_engine_arg='--executor flwr.superexec.deployment:executor'
;;
simulation-engine)
superexec_engine_arg='--executor flwr.superexec.simulation:executor
--executor-config 'num-supernodes=10''
;;
esac


# Create and install Flower app
flwr new e2e-tmp-test --framework numpy --username flwrlabs
cd e2e-tmp-test
# Remove flwr dependency from `pyproject.toml`. Seems necessary so that it does
# not override the wheel dependency
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS (Darwin) system
sed -i '' '/flwr\[simulation\]/d' pyproject.toml
else
# Non-macOS system (Linux)
sed -i '/flwr\[simulation\]/d' pyproject.toml
fi
pip install -e . --no-deps

# Check if the first argument is 'insecure'
if [ "$1" == "insecure" ]; then
# If $1 is 'insecure', append the first line
echo -e $"\n[tool.flwr.federations.superexec]\naddress = \"127.0.0.1:9093\"\ninsecure = true" >> pyproject.toml
else
# Otherwise, append the second line
echo -e $"\n[tool.flwr.federations.superexec]\naddress = \"127.0.0.1:9093\"\nroot-certificates = \"../certificates/ca.crt\"" >> pyproject.toml
fi

timeout 2m flower-superlink $server_arg $server_auth &
sl_pid=$!
sleep 2

timeout 2m flower-supernode ./ $client_arg \
--superlink $server_address $client_auth_1 \
--node-config "partition-id=0 num-partitions=2" --max-retries 0 &
cl1_pid=$!
sleep 2

timeout 2m flower-supernode ./ $client_arg \
--superlink $server_address $client_auth_2 \
--node-config "partition-id=1 num-partitions=2" --max-retries 0 &
cl2_pid=$!
sleep 2

timeout 2m flower-superexec $superexec_arg $superexec_engine_arg 2>&1 | tee flwr_output.log &
se_pid=$(pgrep -f "flower-superexec")
sleep 2

timeout 1m flwr run --run-config num-server-rounds=1 ../e2e-tmp-test superexec

# Initialize a flag to track if training is successful
found_success=false
timeout=120 # Timeout after 120 seconds
elapsed=0

# Check for "Success" in a loop with a timeout
while [ "$found_success" = false ] && [ $elapsed -lt $timeout ]; do
if grep -q "Run finished" flwr_output.log; then
echo "Training worked correctly!"
found_success=true
kill $cl1_pid; kill $cl2_pid; sleep 1; kill $sl_pid; kill $se_pid;
else
echo "Waiting for training ... ($elapsed seconds elapsed)"
fi
# Sleep for a short period and increment the elapsed time
sleep 2
elapsed=$((elapsed + 2))
done

if [ "$found_success" = false ]; then
echo "Training had an issue and timed out."
kill $cl1_pid; kill $cl2_pid; kill $sl_pid; kill $se_pid;
fi
3 changes: 3 additions & 0 deletions src/py/flwr/client/grpc_rere_client/client_interceptor.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
generate_shared_key,
public_key_to_bytes,
)
from flwr.proto.fab_pb2 import GetFabRequest # pylint: disable=E0611
from flwr.proto.fleet_pb2 import ( # pylint: disable=E0611
CreateNodeRequest,
DeleteNodeRequest,
Expand All @@ -50,6 +51,7 @@
PushTaskResRequest,
GetRunRequest,
PingRequest,
GetFabRequest,
]


Expand Down Expand Up @@ -126,6 +128,7 @@ def intercept_unary_unary(
PushTaskResRequest,
GetRunRequest,
PingRequest,
GetFabRequest,
),
):
if self.shared_secret is None:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
generate_shared_key,
verify_hmac,
)
from flwr.proto.fab_pb2 import GetFabRequest # pylint: disable=E0611
from flwr.proto.fleet_pb2 import ( # pylint: disable=E0611
CreateNodeRequest,
CreateNodeResponse,
Expand Down Expand Up @@ -173,6 +174,7 @@ def _verify_node_id(
PushTaskResRequest,
GetRunRequest,
PingRequest,
GetFabRequest,
],
) -> bool:
if node_id is None:
Expand All @@ -183,6 +185,8 @@ def _verify_node_id(
return request.task_res_list[0].task.producer.node_id == node_id
if isinstance(request, GetRunRequest):
return node_id in self.state.get_nodes(request.run_id)
if isinstance(request, GetFabRequest):
return True
return request.node.node_id == node_id

def _verify_hmac(
Expand Down