-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #238 from grycap/change-doc
Changes in docs
- Loading branch information
Showing
6 changed files
with
165 additions
and
122 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
## Exposed services | ||
|
||
OSCAR supports the deployment and elasticity management of long-running services that must be directly reachable outside the cluster. This functionality answers the need to support the fast inference of pre-trained AI models that require close to real-time processing with high throughput. In a traditional serverless approach, the AI model weights would be loaded in memory for each service invocation. Exposed services are also helpful when stateless services created out of large containers require too much time to start processing a service invocation. | ||
|
||
Instead, by exposing an OSCAR service, the AI model weights could be loaded just once, and the service would perform the AI model inference for each subsequent request. An auto-scaled load-balanced approach for these stateless services is supported. When the average CPU exceeds a certain user-defined threshold, additional service instances (i.e. pods) will be dynamically created (and removed when no longer necessary) within the user-defined boundaries As mentioned previously, this kind of service provides elasticity management with a load-balanced approach. When the average CPU exceeds a certain user-defined threshold, additional service instances will be dynamically created (and removed when no longer necessary). The user can also define the minimum and maximum instances of the service to be present on the cluster (see the parameters `min_scale` and `max_scale` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)). | ||
|
||
|
||
### Prerequisites in the container image | ||
The container image needs to have an HTTP server that binds to a specific port (see the parameter `port` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)`). If developing a service from scratch in Python, you can use [FastAPI](https://fastapi.tiangolo.com/) or [Flask](https://flask.palletsprojects.com/en/2.3.x/) to create an API. In Go, you can use [Gin](https://gin-gonic.com/) or [Sinatra](https://sinatrarb.com/) in Ruby. | ||
|
||
Notice that if the service exposes a web-based UI, you must ensure that the content cannot only be served from the root document ('/') since the service will be exposed in a certain subpath. | ||
|
||
### How to define an exposed OSCAR service | ||
|
||
The minimum definition to expose an OSCAR service is to indicate in the corresponding [FDL](https://docs.oscar.grycap.net/fdl/) the port inside the container where the service will be listening. | ||
|
||
``` yaml | ||
expose: | ||
api_port: 5000 | ||
``` | ||
Once the service is deployed, you can check if it was created correctly by making an HTTP request to the exposed endpoint, which would look like the following. | ||
``` bash | ||
https://{oscar_endpoint}/system/services/{service_name}/exposed/{path_resource} | ||
|
||
``` | ||
|
||
Notice that if you get a `502 Bad Gateway` error, it is most likely because the specified port on the service doesn't match the API port. | ||
|
||
Additional options can be defined in the "expose" section of the FDL (some previously mentioned), such as: | ||
- `min_scale`: The minimum number of active pods (default: 1). | ||
- `max_scale`: The maximum number of active pods (default: 10) or the CPU threshold, which, once exceeded, will trigger the creation of additional pods (default: 80%). | ||
- `rewrite_target`: Target the URI where the traffic is redirected. (default: false) | ||
- `NodePort`: The access method from the domain name to the public ip <cluster_ip>:<NodePort>. | ||
- `default_command`: Selects between executing the container's default command and executing the script inside the container. (default: false, it executes the script) | ||
- `set_auth`: The credentials are composed of the service name as the user and the service token as the password. Turn off this field if the container has an authentication itself. It does not work with `NodePort`.(default: false, it has no authentication) | ||
|
||
|
||
Below is an example of the expose setion of the FDL, showing that there will be between 5 to 15 active pods and that the service will expose an API in port 4578. The number of active pods will grow when the use of CPU increases by more than 50% and the active pods will decrease when the CPU use decreases. | ||
|
||
``` yaml | ||
expose: | ||
min_scale: 5 | ||
max_scale: 15 | ||
api_port: 4578 | ||
cpu_threshold: 50 | ||
set_auth: true | ||
rewrite_target: true | ||
default_command: true | ||
``` | ||
In addition, you can see there a full example of a recipe to expose a service from the [AI4EOSC/DEEP Open Catalog](https://marketplace.deep-hybrid-datacloud.eu/) | ||
``` yaml | ||
functions: | ||
oscar: | ||
- oscar-cluster: | ||
name: body-pose-detection | ||
memory: 2Gi | ||
cpu: '1.0' | ||
image: deephdc/deep-oc-posenet-tf | ||
script: script.sh | ||
environment: | ||
Variables: | ||
INPUT_TYPE: json | ||
expose: | ||
min_scale: 1 | ||
max_scale: 10 | ||
port: 5000 | ||
cpu_threshold: 20 | ||
set_auth: true | ||
input: | ||
- storage_provider: minio.default | ||
path: body-pose-detection/input | ||
output: | ||
- storage_provider: minio.default | ||
path: body-pose-detection/output | ||
``` | ||
So, to invoke the API of this example the request will need the following information, | ||
1. OSCAR endpoint. `localhost` or `https://{OSCAR_endpoint}` | ||
2. Path resource. In this case, it is `v2/models/posenetclas/predict/`. Please do not forget the final `/` | ||
3. Use `-k` or `--insecure` if the SSL is false. | ||
4. Input image with the name `people.jpeg` | ||
5. Output. It will create a `.zip` file that has the outputs | ||
|
||
and will end up looking like this: | ||
|
||
``` bash | ||
curl {-k} -X POST https://{oscar_endpoint}/system/services/body-pose-detection-async/exposed/{path resource} -H "accept: */*" -H "Content-Type: multipart/form-data" -F "data=@{input image};type=image/png" --output {output file} | ||
``` | ||
|
||
Finally, the complete command that works in [Local Testing](https://docs.oscar.grycap.net/local-testing/) with an image called `people.jpeg` as input and `output_posenet.zip` as output. | ||
|
||
``` bash | ||
curl -X POST https://localhost/system/services/body-pose-detection-async/exposed/v3/models/posenetclas/predict/ -H "accept: */*" -H "Content-Type: multipart/form-data" -F "[email protected];type=image/png" --output output_posenet.zip | ||
``` | ||
|
||
Another FDL example shows how to expose a simple NGINX server as an OSCAR service: | ||
|
||
``` yaml | ||
functions: | ||
oscar: | ||
- oscar-cluster: | ||
name: nginx | ||
memory: 2Gi | ||
cpu: '1.0' | ||
image: nginx | ||
script: script.sh | ||
expose: | ||
min_scale: 2 | ||
max_scale: 10 | ||
port: 80 | ||
cpu_threshold: 50 | ||
``` | ||
|
||
In case you use the NGINX example above in your [local OSCAR cluster](https://docs.oscar.grycap.net/local-testing/), you will see the nginx welcome page in: `http://localhost/system/services/nginx/exposed/`. | ||
Two active pods of the deployment will be shown with the command `kubectl get pods -n oscar-svc` | ||
|
||
``` text | ||
oscar-svc nginx-dlp-6b9ddddbd7-cm6c9 1/1 Running 0 2m1s | ||
oscar-svc nginx-dlp-6b9ddddbd7-f4ml6 1/1 Running 0 2m1s | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Interlink | ||
|
||
InterLink aims to provide an abstraction for executing a Kubernetes pod on any remote resource capable of managing a Container execution lifecycle. | ||
|
||
OSCAR uses the Kubernetes Virtual Node to translate a job request from the Kubernetes pod into a remote call. We have been using Interlink to interact with an HPC cluster. For more infomation check the [Interlink landing page](https://intertwin-eu.github.io/interLink). | ||
|
||
![Diagram](just_interlink2.png) | ||
|
||
## Installation and use of Interlink Node in OSCAR cluster | ||
|
||
The cluster Kubernetes must have at least one virtual kubelet node. Those nodes will have tagged as `type=virtual-kubelet`. So, follow these steps to [add the Virtual node](https://intertwin-eu.github.io/interLink/docs/tutorial-admins/deploy-interlink) to the Kubernetes cluster. OSCAR detects these nodes by itself. | ||
|
||
Once the Virtual node and OSCAR are installed correctly, you use this node by adding the name of the virtual node in the `InterLinkNodeName` variable. | ||
Otherwise, to use a normal node of the Kubernetes cluster, let in blank `""` | ||
|
||
|
||
### Annotations, Restrictions, and other things to keep in mind | ||
|
||
The [OSCAR services annotations](https://docs.oscar.grycap.net/fdl/#service) persist in the virtual node and affect the behavior of the offload jobs. | ||
|
||
The memory and CPU defined in the OSCAR services field do not affect the offload job. To request resources in the offload job, use the [slurm flags](https://curc.readthedocs.io/en/latest/running-jobs/job-resources.html#slurm-resource-flags) `slurm-job.vk.io/flags`( `--job-name`, `--time=02:30:00`, `--cpus-per-task`, `--nodes`, `--mem`) | ||
|
||
For example, you can mount a system folder in an HPC cluster with the key annotation `job.vk.io/singularity-mounts` and value pattern `"--bind <outside-container>:<inside-container>"`. The offload jobs are executed in a remote HPC cluster. So, a persistent volume claim cannot be mounted. | ||
|
||
Another example is the annotation `job.vk.io/pre-exec`, which will execute a command before each execution. | ||
|
||
Any environment variable with a special character could create an error in the translation between the virtual node and the remote job. As a good practice, pass the environment variable encode in base64 and decode inside the execution of the script. | ||
|
||
As a reminder, Interlink uses singularity to run a container with this characteristic: | ||
|
||
- You must reference the image container as singularity pattern `docker://ghcr.io/intertwin-eu/itwinai:0.0.1-3dgan-0.2`. Once the image is pulled, the image can be referenced by path `<path-of-container>/itwinaiv6.sif`. | ||
- You are not a superuser. You cannot write in the regular file system. Use the /tmp folder. | ||
- That the working directory is not the same as the container. So, work with the absolute paths. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -128,121 +128,3 @@ base64 input.png | curl -X POST -H "Authorization: Bearer <TOKEN>" \ | |
Although the use of the Knative Serverless Backend for synchronous invocations provides elasticity similar to the one provided by their counterparts in public clouds, such as AWS Lambda, synchronous invocations are not still the best option to run long-running resource-demanding applications, like deep learning inference or video processing. | ||
|
||
The synchronous invocation of long-running resource-demanding applications may lead to timeouts on Knative pods. Therefore, we consider Kubernetes job generation as the optimal approach to handle event-driven file processing through asynchronous invocations in OSCAR, being the execution of synchronous services a convenient way to support general lightweight container-based applications. | ||
|
||
## Exposed services | ||
|
||
OSCAR also supports the deployment and elasticity management of long-running services that need to be directly reachable from outside the cluster (i.e. exposed services). This is useful when stateless services created out of large containers require too much time to be started to process a service invocation. This is the case when supporting the fast inference of pre-trained AI models that require close to real-time processing with high throughput. In a traditional serverless approach, the AI model weights would be loaded in memory for each service invocation (thus creating a new container). | ||
|
||
Instead, by exposing an OSCAR service, the AI model weights could be loaded just once and the service would perform the AI model inference for each subsequent request. An auto-scaled load-balanced approach for these stateless services is supported. When the average CPU exceeds a certain user-defined threshold, additional service instances (i.e. pods) will be dynamically created (and removed when no longer necessary), within the user-defined boundaries (see the parameters `min_scale` and `max_scale` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)). | ||
|
||
|
||
### Prerequisites in the container image | ||
The container image needs to have an HTTP server that binds to a certain port (see the parameter `port` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)`). If developing a service from scratch, in Python you can use [FastAPI](https://fastapi.tiangolo.com/) or [Flask](https://flask.palletsprojects.com/en/2.3.x/) to create an API. In Go you can use [Gin](https://gin-gonic.com/) or [Sinatra](https://sinatrarb.com/) in Ruby. | ||
|
||
Notice that if the service exposes a web-based UI you must ensure that the content cannot only be served from the root document ('/'), since the service will be exposed in a certain subpath. | ||
|
||
### How to define an exposed OSCAR service | ||
|
||
The minimum definition to expose an OSCAR service is to indicate in the corresponding [FDL](https://docs.oscar.grycap.net/fdl/) file the port inside the container where the service will be listening. | ||
|
||
``` yaml | ||
expose: | ||
port: 5000 | ||
``` | ||
Once the service is deployed, if you invoke the service and it returns a `502 Bad Gateway` error, the port is wrong. | ||
|
||
|
||
Additional options can be defined in the "expose" section, such as the minimum number of active pods (default: 1). | ||
The maximum number of active pods (default: 10) or the CPU threshold which, once exceeded, will triger the creation of additional pods (default: 80%). | ||
|
||
Below is a specification with more details where there will be between 5 to 15 active pods and the service exposes an API in port 4578. The number of active pods will grow when the use of CPU increases by more than 50%. | ||
The active pods will decrease when the use of CPU decreases. | ||
|
||
``` yaml | ||
expose: | ||
min_scale: 5 | ||
max_scale: 15 | ||
port: 4578 | ||
cpu_threshold: 50 | ||
``` | ||
|
||
Below there is an example of a recipe to expose a service from the [AI4EOSC/DEEP Open Catalog](https://marketplace.deep-hybrid-datacloud.eu/) | ||
|
||
``` yaml | ||
functions: | ||
oscar: | ||
- oscar-cluster: | ||
name: body-pose-detection-async | ||
memory: 2Gi | ||
cpu: '1.0' | ||
image: deephdc/deep-oc-posenet-tf | ||
script: script.sh | ||
environment: | ||
Variables: | ||
INPUT_TYPE: json | ||
expose: | ||
min_scale: 1 | ||
max_scale: 10 | ||
port: 5000 | ||
cpu_threshold: 20 | ||
input: | ||
- storage_provider: minio.default | ||
path: body-pose-detection-async/input | ||
output: | ||
- storage_provider: minio.default | ||
path: body-pose-detection-async/output | ||
``` | ||
|
||
|
||
The service will be listening in a URL that follows the next pattern: | ||
|
||
``` text | ||
https://{oscar_endpoint}/system/services/{name of service}/exposed/ | ||
``` | ||
|
||
Now, let's show an example of executing the [Body pose detection](https://marketplace.deep-hybrid-datacloud.eu/modules/deep-oc-posenet-tf.html) ML model of [AI4EOSC/DEEP Open Catalog](https://marketplace.deep-hybrid-datacloud.eu/). We need to have in mind several factors: | ||
|
||
1. OSCAR endpoint. `localhost` or `https://{OSCAR_endpoint}` | ||
2. Path resource. In this case, it is `v2/models/posenetclas/predict/`. Please do not forget the final `/` | ||
3. Use `-k` or `--insecure` if the SSL is false. | ||
4. Input image with the name `people.jpeg` | ||
5. Output. It will create a `.zip` file that has the output | ||
|
||
The following code section represents a schema of the command: | ||
|
||
``` bash | ||
curl {-k} -X POST https://{oscar_endpoint}/system/services/body-pose-detection-async/exposed/{path resource} -H "accept: */*" -H "Content-Type: multipart/form-data" -F "data=@{input image};type=image/png" --output {output file} | ||
``` | ||
|
||
Finally, the complete command that works in [Local Testing](https://docs.oscar.grycap.net/local-testing/) with an image called `people.jpeg` as input and `output_posenet.zip` as output. | ||
|
||
``` bash | ||
curl -X POST https://localhost/system/services/body-pose-detection-async/exposed/v3/models/posenetclas/predict/ -H "accept: */*" -H "Content-Type: multipart/form-data" -F "[email protected];type=image/png" --output output_posenet.zip | ||
``` | ||
|
||
Another FDL example shows how to expose a simple NGINX server as an OSCAR service: | ||
|
||
``` yaml | ||
functions: | ||
oscar: | ||
- oscar-cluster: | ||
name: nginx | ||
memory: 2Gi | ||
cpu: '1.0' | ||
image: nginx | ||
script: script.sh | ||
expose: | ||
min_scale: 2 | ||
max_scale: 10 | ||
port: 80 | ||
cpu_threshold: 50 | ||
``` | ||
|
||
In case you use the NGINX example above in your [local OSCAR cluster](https://docs.oscar.grycap.net/local-testing/), you will see the nginx welcome page in: `http://localhost/system/services/nginx/exposed/`. | ||
Two active pods of the deployment will be shown with the command `kubectl get pods -n oscar-svc` | ||
|
||
``` text | ||
oscar-svc nginx-dlp-6b9ddddbd7-cm6c9 1/1 Running 0 2m1s | ||
oscar-svc nginx-dlp-6b9ddddbd7-f4ml6 1/1 Running 0 2m1s | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.