Skip to content

Commit

Permalink
feat: add support for custom topology (#237)
Browse files Browse the repository at this point in the history
* feat: add support for custom topology

Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch authored Nov 5, 2024
1 parent a7fce05 commit 1f686fe
Show file tree
Hide file tree
Showing 70 changed files with 3,205 additions and 6,142 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ api: generate api
go run hack/python-sdk/main.go ${API_VERSION} > ${SWAGGER_API_JSON}
rm -rf ./sdk/python/${API_VERSION}/fluxoperator/model/*
rm -rf ./sdk/python/${API_VERSION}/fluxoperator/test/test_*.py
java -jar ${SWAGGER_JAR} generate -i ${SWAGGER_API_JSON} -g python-legacy -o ./sdk/python/${API_VERSION} -c ./hack/python-sdk/swagger_config.json --git-repo-id flux-operator --git-user-id flux-framework
java -jar ${SWAGGER_JAR} generate -i ${SWAGGER_API_JSON} -g python -o ./sdk/python/${API_VERSION} -c ./hack/python-sdk/swagger_config.json --git-repo-id flux-operator --git-user-id flux-framework

# These were needed for the python (not python-legacy)
# cp ./hack/python-sdk/fluxoperator/* ./sdk/python/${API_VERSION}/fluxoperator/model/
Expand Down
8 changes: 8 additions & 0 deletions api/v1alpha2/minicluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -312,6 +312,14 @@ type FluxSpec struct {
// +optional
MinimalService bool `json:"minimalService"`

// Disable specifying the socket path
// +optional
DisableSocket bool `json:"disableSocket"`

// Specify a custom Topology
// +optional
Topology string `json:"topology"`

// Do not wait for the socket
// +optional
NoWaitSocket bool `json:"noWaitSocket"`
Expand Down
10 changes: 10 additions & 0 deletions api/v1alpha2/swagger.json
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,11 @@
"type": "string",
"default": ""
},
"disableSocket": {
"description": "Disable specifying the socket path",
"type": "boolean",
"default": false
},
"logLevel": {
"description": "Log level to use for flux logging (only in non TestMode)",
"type": "integer",
Expand Down Expand Up @@ -325,6 +330,11 @@
"description": "Modify flux submit to be something else",
"type": "string"
},
"topology": {
"description": "Specify a custom Topology",
"type": "string",
"default": ""
},
"wrap": {
"description": "Commands for flux start --wrap",
"type": "string"
Expand Down
16 changes: 16 additions & 0 deletions api/v1alpha2/zz_generated.openapi.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions chart/templates/minicluster-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,9 @@ spec:
This is not recommended in favor of providing the secret
name as curveCertSecret, below
type: string
disableSocket:
description: Disable specifying the socket path
type: boolean
logLevel:
default: 6
description: Log level to use for flux logging (only in non TestMode)
Expand Down Expand Up @@ -461,6 +464,9 @@ spec:
submitCommand:
description: Modify flux submit to be something else
type: string
topology:
description: Specify a custom Topology
type: string
wrap:
description: Commands for flux start --wrap
type: string
Expand Down
6 changes: 6 additions & 0 deletions config/crd/bases/flux-framework.org_miniclusters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,9 @@ spec:
This is not recommended in favor of providing the secret
name as curveCertSecret, below
type: string
disableSocket:
description: Disable specifying the socket path
type: boolean
logLevel:
default: 6
description: Log level to use for flux logging (only in non TestMode)
Expand Down Expand Up @@ -462,6 +465,9 @@ spec:
submitCommand:
description: Modify flux submit to be something else
type: string
topology:
description: Specify a custom Topology
type: string
wrap:
description: Commands for flux start --wrap
type: string
Expand Down
5 changes: 3 additions & 2 deletions docs/development/developer-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,8 +223,9 @@ For older versions, we have a set of example containers [rse-ops/flux-hpc](https
that include Flux in the container. Our new bases are [rse-ops/hpc-apps](https://github.com/rse-ops/hpc-apps) that do not have flux.
For our flux view, we take the following steps:

- A sidecar (init container) is created to stage the flux view at /mnt/flux
- A file /mnt/flux/flux-view.sh is available to source for paths, python path, and a `$fluxsocket` variable
- A sidecar (init container) is created to stage the flux view at /mnt/flux.
- A file `/mnt/flux/flux-view.sh` is available to source for paths, python path, and a `$fluxsocket` variable.
- Execute `/mnt/flux/flux-connect.sh` to source the above and connect.
- All configuration files are under that root, and prepared by the init container.

### Testing
Expand Down
11 changes: 11 additions & 0 deletions docs/getting_started/custom-resource-definition.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,17 @@ spec:

This can be useful for cases of autoscaling in the down direction when you need to drain a node, and then delete the pod.

#### topology

By default, Flux will have a flat topology with one lead broker (rank 0) and some number of children. You can customize this with the `topology` field:

```yaml
flux:
topology: kary:2
```

For example, you might chooes `kary:1` (or another value) or `binomial`. You can then use `flux overlay status` after connecting to your cluster to see it.

#### submitCommand

If you need to use a container with a different version of flux (or an older one)
Expand Down
6 changes: 6 additions & 0 deletions examples/dist/flux-operator-arm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,9 @@ spec:
This is not recommended in favor of providing the secret
name as curveCertSecret, below
type: string
disableSocket:
description: Disable specifying the socket path
type: boolean
logLevel:
default: 6
description: Log level to use for flux logging (only in non TestMode)
Expand Down Expand Up @@ -468,6 +471,9 @@ spec:
submitCommand:
description: Modify flux submit to be something else
type: string
topology:
description: Specify a custom Topology
type: string
wrap:
description: Commands for flux start --wrap
type: string
Expand Down
6 changes: 6 additions & 0 deletions examples/dist/flux-operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -431,6 +431,9 @@ spec:
This is not recommended in favor of providing the secret
name as curveCertSecret, below
type: string
disableSocket:
description: Disable specifying the socket path
type: boolean
logLevel:
default: 6
description: Log level to use for flux logging (only in non TestMode)
Expand Down Expand Up @@ -468,6 +471,9 @@ spec:
submitCommand:
description: Modify flux submit to be something else
type: string
topology:
description: Specify a custom Topology
type: string
wrap:
description: Commands for flux start --wrap
type: string
Expand Down
10 changes: 10 additions & 0 deletions pkg/flux/templates/components.sh
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,16 @@ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$viewroot/lib
export fluxsocket=local://${viewroot}/run/flux/local
EOT
${SUDO} mv ./flux-view.sh ${viewbase}/flux-view.sh

# The same, but also connect
cat <<EOT >> ./flux-connect.sh
#!/bin/bash
. ${viewbase}/flux-view.sh
flux proxy ${fluxsocket} bash
EOT
${SUDO} mv ./flux-connect.sh ${viewbase}/flux-connect.sh


{{end}}
{{define "ensure-pip"}}
${SUDO} ${pythonversion} -m pip --version || ${SUDO} ${pythonversion} -m ensurepip || (${SUDO} wget https://bootstrap.pypa.io/get-pip.py && ${pythonversion} ./get-pip.py) {{ if .Spec.Logging.Quiet }}> /dev/null 2>&1{{ end }}
Expand Down
5 changes: 2 additions & 3 deletions pkg/flux/templates/wait.sh
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,8 @@ ls .
brokerOptions="-Scron.directory=/etc/flux/system/cron.d \
-Stbon.fanout=256 \
-Srundir=${viewroot}/run/flux {{ if .Spec.Interactive }}-Sbroker.rc2_none {{ end }} \
-Sstatedir=${STATE_DIR} \
-Slocal-uri=local://$viewroot/run/flux/local \
{{ if .Spec.Flux.ConnectTimeout }}-Stbon.connect_timeout={{ .Spec.Flux.ConnectTimeout }}{{ end }} \
-Sstatedir=${STATE_DIR} {{ if .Spec.Flux.DisableSocket }}{{ else }}-Slocal-uri=local://$viewroot/run/flux/local \{{ end }}
{{ if .Spec.Flux.ConnectTimeout }}-Stbon.connect_timeout={{ .Spec.Flux.ConnectTimeout }}{{ end }} {{ if .Spec.Flux.Topology }}-Stbon.topo={{ .Spec.Flux.Topology }}{{ end }} \
{{ if .RequiredRanks }}-Sbroker.quorum={{ .RequiredRanks }}{{ end }} \
{{ if .Spec.Logging.Zeromq }}-Stbon.zmqdebug=1{{ end }} \
{{ if not .Spec.Logging.Quiet }} -Slog-stderr-level={{or .Spec.Flux.LogLevel 6}} {{ else }} -Slog-stderr-level=0 {{ end }} \
Expand Down
38 changes: 18 additions & 20 deletions sdk/python/v1alpha2/.gitlab-ci.yml
Original file line number Diff line number Diff line change
@@ -1,33 +1,31 @@
# NOTE: This file is auto generated by OpenAPI Generator.
# URL: https://openapi-generator.tech
#
# ref: https://docs.gitlab.com/ee/ci/README.html
# ref: https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Python.gitlab-ci.yml

stages:
- test

.nosetest:
.pytest:
stage: test
script:
- pip install -r requirements.txt
- pip install -r test-requirements.txt
- pytest --cov=fluxoperator

nosetest-2.7:
extends: .nosetest
image: python:2.7-alpine
nosetest-3.3:
extends: .nosetest
image: python:3.3-alpine
nosetest-3.4:
extends: .nosetest
image: python:3.4-alpine
nosetest-3.5:
extends: .nosetest
image: python:3.5-alpine
nosetest-3.6:
extends: .nosetest
image: python:3.6-alpine
nosetest-3.7:
extends: .nosetest
pytest-3.7:
extends: .pytest
image: python:3.7-alpine
nosetest-3.8:
extends: .nosetest
pytest-3.8:
extends: .pytest
image: python:3.8-alpine
pytest-3.9:
extends: .pytest
image: python:3.9-alpine
pytest-3.10:
extends: .pytest
image: python:3.10-alpine
pytest-3.11:
extends: .pytest
image: python:3.11-alpine
12 changes: 6 additions & 6 deletions sdk/python/v1alpha2/.travis.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# ref: https://docs.travis-ci.com/user/languages/python
language: python
python:
- "2.7"
- "3.2"
- "3.3"
- "3.4"
- "3.5"
- "3.6"
- "3.7"
- "3.8"
- "3.9"
- "3.10"
- "3.11"
# uncomment the following if needed
#- "3.11-dev" # 3.11 development branch
#- "nightly" # nightly build
# command to install dependencies
install:
- "pip install -r requirements.txt"
Expand Down
18 changes: 18 additions & 0 deletions sdk/python/v1alpha2/docs/BurstedCluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,29 @@


## Properties

Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**name** | **str** | The hostnames for the bursted clusters If set, the user is responsible for ensuring uniqueness. The operator will set to burst-N | [optional] [default to '']
**size** | **int** | Size of bursted cluster. Defaults to same size as local minicluster if not set | [optional]

## Example

```python
from fluxoperator.models.bursted_cluster import BurstedCluster

# TODO update the JSON string below
json = "{}"
# create an instance of BurstedCluster from a JSON string
bursted_cluster_instance = BurstedCluster.from_json(json)
# print the JSON string representation of the object
print(BurstedCluster.to_json())

# convert the object into a dict
bursted_cluster_dict = bursted_cluster_instance.to_dict()
# create an instance of BurstedCluster from a dict
bursted_cluster_from_dict = BurstedCluster.from_dict(bursted_cluster_dict)
```
[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


20 changes: 19 additions & 1 deletion sdk/python/v1alpha2/docs/Bursting.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,30 @@
Bursting Config For simplicity, we internally handle the name of the job (hostnames)

## Properties

Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**clusters** | [**list[BurstedCluster]**](BurstedCluster.md) | External clusters to burst to. Each external cluster must share the same listing to align ranks | [optional]
**clusters** | [**List[BurstedCluster]**](BurstedCluster.md) | External clusters to burst to. Each external cluster must share the same listing to align ranks | [optional]
**hostlist** | **str** | Hostlist is a custom hostlist for the broker.toml that includes the local plus bursted cluster. This is typically used for bursting to another resource type, where we can predict the hostnames but they don&#39;t follow the same convention as the Flux Operator | [optional] [default to '']
**lead_broker** | [**FluxBroker**](FluxBroker.md) | | [optional]

## Example

```python
from fluxoperator.models.bursting import Bursting

# TODO update the JSON string below
json = "{}"
# create an instance of Bursting from a JSON string
bursting_instance = Bursting.from_json(json)
# print the JSON string representation of the object
print(Bursting.to_json())

# convert the object into a dict
bursting_dict = bursting_instance.to_dict()
# create an instance of Bursting from a dict
bursting_from_dict = Bursting.from_dict(bursting_dict)
```
[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


18 changes: 18 additions & 0 deletions sdk/python/v1alpha2/docs/Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@


## Properties

Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**broker_pre** | **str** | A single command for only the broker to run | [optional] [default to '']
Expand All @@ -13,6 +14,23 @@ Name | Type | Description | Notes
**service_pre** | **str** | A command only for service start.sh tor run | [optional] [default to '']
**worker_pre** | **str** | A command only for workers to run | [optional] [default to '']

## Example

```python
from fluxoperator.models.commands import Commands

# TODO update the JSON string below
json = "{}"
# create an instance of Commands from a JSON string
commands_instance = Commands.from_json(json)
# print the JSON string representation of the object
print(Commands.to_json())

# convert the object into a dict
commands_dict = commands_instance.to_dict()
# create an instance of Commands from a dict
commands_from_dict = Commands.from_dict(commands_dict)
```
[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)


Loading

0 comments on commit 1f686fe

Please sign in to comment.