Skip to content

Commit

Permalink
Merge pull request #238 from grycap/change-doc
Browse files Browse the repository at this point in the history
Changes in docs
  • Loading branch information
catttam authored May 23, 2024
2 parents 9f5e509 + 53ce863 commit a50bb09
Show file tree
Hide file tree
Showing 6 changed files with 165 additions and 122 deletions.
125 changes: 125 additions & 0 deletions docs/expose_services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
## Exposed services

OSCAR supports the deployment and elasticity management of long-running services that must be directly reachable outside the cluster. This functionality answers the need to support the fast inference of pre-trained AI models that require close to real-time processing with high throughput. In a traditional serverless approach, the AI model weights would be loaded in memory for each service invocation. Exposed services are also helpful when stateless services created out of large containers require too much time to start processing a service invocation.

Instead, by exposing an OSCAR service, the AI model weights could be loaded just once, and the service would perform the AI model inference for each subsequent request. An auto-scaled load-balanced approach for these stateless services is supported. When the average CPU exceeds a certain user-defined threshold, additional service instances (i.e. pods) will be dynamically created (and removed when no longer necessary) within the user-defined boundaries As mentioned previously, this kind of service provides elasticity management with a load-balanced approach. When the average CPU exceeds a certain user-defined threshold, additional service instances will be dynamically created (and removed when no longer necessary). The user can also define the minimum and maximum instances of the service to be present on the cluster (see the parameters `min_scale` and `max_scale` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)).


### Prerequisites in the container image
The container image needs to have an HTTP server that binds to a specific port (see the parameter `port` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)`). If developing a service from scratch in Python, you can use [FastAPI](https://fastapi.tiangolo.com/) or [Flask](https://flask.palletsprojects.com/en/2.3.x/) to create an API. In Go, you can use [Gin](https://gin-gonic.com/) or [Sinatra](https://sinatrarb.com/) in Ruby.

Notice that if the service exposes a web-based UI, you must ensure that the content cannot only be served from the root document ('/') since the service will be exposed in a certain subpath.

### How to define an exposed OSCAR service

The minimum definition to expose an OSCAR service is to indicate in the corresponding [FDL](https://docs.oscar.grycap.net/fdl/) the port inside the container where the service will be listening.

``` yaml
expose:
api_port: 5000
```
Once the service is deployed, you can check if it was created correctly by making an HTTP request to the exposed endpoint, which would look like the following.
``` bash
https://{oscar_endpoint}/system/services/{service_name}/exposed/{path_resource}

```

Notice that if you get a `502 Bad Gateway` error, it is most likely because the specified port on the service doesn't match the API port.

Additional options can be defined in the "expose" section of the FDL (some previously mentioned), such as:
- `min_scale`: The minimum number of active pods (default: 1).
- `max_scale`: The maximum number of active pods (default: 10) or the CPU threshold, which, once exceeded, will trigger the creation of additional pods (default: 80%).
- `rewrite_target`: Target the URI where the traffic is redirected. (default: false)
- `NodePort`: The access method from the domain name to the public ip <cluster_ip>:<NodePort>.
- `default_command`: Selects between executing the container's default command and executing the script inside the container. (default: false, it executes the script)
- `set_auth`: The credentials are composed of the service name as the user and the service token as the password. Turn off this field if the container has an authentication itself. It does not work with `NodePort`.(default: false, it has no authentication)


Below is an example of the expose setion of the FDL, showing that there will be between 5 to 15 active pods and that the service will expose an API in port 4578. The number of active pods will grow when the use of CPU increases by more than 50% and the active pods will decrease when the CPU use decreases.

``` yaml
expose:
min_scale: 5
max_scale: 15
api_port: 4578
cpu_threshold: 50
set_auth: true
rewrite_target: true
default_command: true
```
In addition, you can see there a full example of a recipe to expose a service from the [AI4EOSC/DEEP Open Catalog](https://marketplace.deep-hybrid-datacloud.eu/)
``` yaml
functions:
oscar:
- oscar-cluster:
name: body-pose-detection
memory: 2Gi
cpu: '1.0'
image: deephdc/deep-oc-posenet-tf
script: script.sh
environment:
Variables:
INPUT_TYPE: json
expose:
min_scale: 1
max_scale: 10
port: 5000
cpu_threshold: 20
set_auth: true
input:
- storage_provider: minio.default
path: body-pose-detection/input
output:
- storage_provider: minio.default
path: body-pose-detection/output
```
So, to invoke the API of this example the request will need the following information,
1. OSCAR endpoint. `localhost` or `https://{OSCAR_endpoint}`
2. Path resource. In this case, it is `v2/models/posenetclas/predict/`. Please do not forget the final `/`
3. Use `-k` or `--insecure` if the SSL is false.
4. Input image with the name `people.jpeg`
5. Output. It will create a `.zip` file that has the outputs

and will end up looking like this:

``` bash
curl {-k} -X POST https://{oscar_endpoint}/system/services/body-pose-detection-async/exposed/{path resource} -H "accept: */*" -H "Content-Type: multipart/form-data" -F "data=@{input image};type=image/png" --output {output file}
```

Finally, the complete command that works in [Local Testing](https://docs.oscar.grycap.net/local-testing/) with an image called `people.jpeg` as input and `output_posenet.zip` as output.

``` bash
curl -X POST https://localhost/system/services/body-pose-detection-async/exposed/v3/models/posenetclas/predict/ -H "accept: */*" -H "Content-Type: multipart/form-data" -F "[email protected];type=image/png" --output output_posenet.zip
```

Another FDL example shows how to expose a simple NGINX server as an OSCAR service:

``` yaml
functions:
oscar:
- oscar-cluster:
name: nginx
memory: 2Gi
cpu: '1.0'
image: nginx
script: script.sh
expose:
min_scale: 2
max_scale: 10
port: 80
cpu_threshold: 50
```

In case you use the NGINX example above in your [local OSCAR cluster](https://docs.oscar.grycap.net/local-testing/), you will see the nginx welcome page in: `http://localhost/system/services/nginx/exposed/`.
Two active pods of the deployment will be shown with the command `kubectl get pods -n oscar-svc`

``` text
oscar-svc nginx-dlp-6b9ddddbd7-cm6c9 1/1 Running 0 2m1s
oscar-svc nginx-dlp-6b9ddddbd7-f4ml6 1/1 Running 0 2m1s
```
33 changes: 33 additions & 0 deletions docs/interlink_integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Interlink

InterLink aims to provide an abstraction for executing a Kubernetes pod on any remote resource capable of managing a Container execution lifecycle.

OSCAR uses the Kubernetes Virtual Node to translate a job request from the Kubernetes pod into a remote call. We have been using Interlink to interact with an HPC cluster. For more infomation check the [Interlink landing page](https://intertwin-eu.github.io/interLink).

![Diagram](just_interlink2.png)

## Installation and use of Interlink Node in OSCAR cluster

The cluster Kubernetes must have at least one virtual kubelet node. Those nodes will have tagged as `type=virtual-kubelet`. So, follow these steps to [add the Virtual node](https://intertwin-eu.github.io/interLink/docs/tutorial-admins/deploy-interlink) to the Kubernetes cluster. OSCAR detects these nodes by itself.

Once the Virtual node and OSCAR are installed correctly, you use this node by adding the name of the virtual node in the `InterLinkNodeName` variable.
Otherwise, to use a normal node of the Kubernetes cluster, let in blank `""`


### Annotations, Restrictions, and other things to keep in mind

The [OSCAR services annotations](https://docs.oscar.grycap.net/fdl/#service) persist in the virtual node and affect the behavior of the offload jobs.

The memory and CPU defined in the OSCAR services field do not affect the offload job. To request resources in the offload job, use the [slurm flags](https://curc.readthedocs.io/en/latest/running-jobs/job-resources.html#slurm-resource-flags) `slurm-job.vk.io/flags`( `--job-name`, `--time=02:30:00`, `--cpus-per-task`, `--nodes`, `--mem`)

For example, you can mount a system folder in an HPC cluster with the key annotation `job.vk.io/singularity-mounts` and value pattern `"--bind <outside-container>:<inside-container>"`. The offload jobs are executed in a remote HPC cluster. So, a persistent volume claim cannot be mounted.

Another example is the annotation `job.vk.io/pre-exec`, which will execute a command before each execution.

Any environment variable with a special character could create an error in the translation between the virtual node and the remote job. As a good practice, pass the environment variable encode in base64 and decode inside the execution of the script.

As a reminder, Interlink uses singularity to run a container with this characteristic:

- You must reference the image container as singularity pattern `docker://ghcr.io/intertwin-eu/itwinai:0.0.1-3dgan-0.2`. Once the image is pulled, the image can be referenced by path `<path-of-container>/itwinaiv6.sif`.
- You are not a superuser. You cannot write in the regular file system. Use the /tmp folder.
- That the working directory is not the same as the container. So, work with the absolute paths.
118 changes: 0 additions & 118 deletions docs/invoking.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,121 +128,3 @@ base64 input.png | curl -X POST -H "Authorization: Bearer <TOKEN>" \
Although the use of the Knative Serverless Backend for synchronous invocations provides elasticity similar to the one provided by their counterparts in public clouds, such as AWS Lambda, synchronous invocations are not still the best option to run long-running resource-demanding applications, like deep learning inference or video processing.

The synchronous invocation of long-running resource-demanding applications may lead to timeouts on Knative pods. Therefore, we consider Kubernetes job generation as the optimal approach to handle event-driven file processing through asynchronous invocations in OSCAR, being the execution of synchronous services a convenient way to support general lightweight container-based applications.

## Exposed services

OSCAR also supports the deployment and elasticity management of long-running services that need to be directly reachable from outside the cluster (i.e. exposed services). This is useful when stateless services created out of large containers require too much time to be started to process a service invocation. This is the case when supporting the fast inference of pre-trained AI models that require close to real-time processing with high throughput. In a traditional serverless approach, the AI model weights would be loaded in memory for each service invocation (thus creating a new container).

Instead, by exposing an OSCAR service, the AI model weights could be loaded just once and the service would perform the AI model inference for each subsequent request. An auto-scaled load-balanced approach for these stateless services is supported. When the average CPU exceeds a certain user-defined threshold, additional service instances (i.e. pods) will be dynamically created (and removed when no longer necessary), within the user-defined boundaries (see the parameters `min_scale` and `max_scale` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)).


### Prerequisites in the container image
The container image needs to have an HTTP server that binds to a certain port (see the parameter `port` in [ExposeSettings](https://docs.oscar.grycap.net/fdl/#exposesettings)`). If developing a service from scratch, in Python you can use [FastAPI](https://fastapi.tiangolo.com/) or [Flask](https://flask.palletsprojects.com/en/2.3.x/) to create an API. In Go you can use [Gin](https://gin-gonic.com/) or [Sinatra](https://sinatrarb.com/) in Ruby.

Notice that if the service exposes a web-based UI you must ensure that the content cannot only be served from the root document ('/'), since the service will be exposed in a certain subpath.

### How to define an exposed OSCAR service

The minimum definition to expose an OSCAR service is to indicate in the corresponding [FDL](https://docs.oscar.grycap.net/fdl/) file the port inside the container where the service will be listening.

``` yaml
expose:
port: 5000
```
Once the service is deployed, if you invoke the service and it returns a `502 Bad Gateway` error, the port is wrong.


Additional options can be defined in the "expose" section, such as the minimum number of active pods (default: 1).
The maximum number of active pods (default: 10) or the CPU threshold which, once exceeded, will triger the creation of additional pods (default: 80%).

Below is a specification with more details where there will be between 5 to 15 active pods and the service exposes an API in port 4578. The number of active pods will grow when the use of CPU increases by more than 50%.
The active pods will decrease when the use of CPU decreases.

``` yaml
expose:
min_scale: 5
max_scale: 15
port: 4578
cpu_threshold: 50
```

Below there is an example of a recipe to expose a service from the [AI4EOSC/DEEP Open Catalog](https://marketplace.deep-hybrid-datacloud.eu/)

``` yaml
functions:
oscar:
- oscar-cluster:
name: body-pose-detection-async
memory: 2Gi
cpu: '1.0'
image: deephdc/deep-oc-posenet-tf
script: script.sh
environment:
Variables:
INPUT_TYPE: json
expose:
min_scale: 1
max_scale: 10
port: 5000
cpu_threshold: 20
input:
- storage_provider: minio.default
path: body-pose-detection-async/input
output:
- storage_provider: minio.default
path: body-pose-detection-async/output
```


The service will be listening in a URL that follows the next pattern:

``` text
https://{oscar_endpoint}/system/services/{name of service}/exposed/
```

Now, let's show an example of executing the [Body pose detection](https://marketplace.deep-hybrid-datacloud.eu/modules/deep-oc-posenet-tf.html) ML model of [AI4EOSC/DEEP Open Catalog](https://marketplace.deep-hybrid-datacloud.eu/). We need to have in mind several factors:

1. OSCAR endpoint. `localhost` or `https://{OSCAR_endpoint}`
2. Path resource. In this case, it is `v2/models/posenetclas/predict/`. Please do not forget the final `/`
3. Use `-k` or `--insecure` if the SSL is false.
4. Input image with the name `people.jpeg`
5. Output. It will create a `.zip` file that has the output

The following code section represents a schema of the command:

``` bash
curl {-k} -X POST https://{oscar_endpoint}/system/services/body-pose-detection-async/exposed/{path resource} -H "accept: */*" -H "Content-Type: multipart/form-data" -F "data=@{input image};type=image/png" --output {output file}
```

Finally, the complete command that works in [Local Testing](https://docs.oscar.grycap.net/local-testing/) with an image called `people.jpeg` as input and `output_posenet.zip` as output.

``` bash
curl -X POST https://localhost/system/services/body-pose-detection-async/exposed/v3/models/posenetclas/predict/ -H "accept: */*" -H "Content-Type: multipart/form-data" -F "[email protected];type=image/png" --output output_posenet.zip
```

Another FDL example shows how to expose a simple NGINX server as an OSCAR service:

``` yaml
functions:
oscar:
- oscar-cluster:
name: nginx
memory: 2Gi
cpu: '1.0'
image: nginx
script: script.sh
expose:
min_scale: 2
max_scale: 10
port: 80
cpu_threshold: 50
```

In case you use the NGINX example above in your [local OSCAR cluster](https://docs.oscar.grycap.net/local-testing/), you will see the nginx welcome page in: `http://localhost/system/services/nginx/exposed/`.
Two active pods of the deployment will be shown with the command `kubectl get pods -n oscar-svc`

``` text
oscar-svc nginx-dlp-6b9ddddbd7-cm6c9 1/1 Running 0 2m1s
oscar-svc nginx-dlp-6b9ddddbd7-f4ml6 1/1 Running 0 2m1s
```
8 changes: 4 additions & 4 deletions docs/multitenancy.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@ In the context of OSCAR, multi-tenancy support refers to the platform's ability

- **Users who want to create new services need to know the UID of the users who will have access to the service.**

Each service has a list of "allowed users," so a service can be accessed not only by one but by multiple users chosen by the service creator. This way, users have the ability to decide who can operate over its services. It is important to note that at this moment, a user with access to a service has full access; this means he can edit/delete the service besides making executions.
Each service has a list of "allowed users," so a service can be accessed not only by one but by multiple users chosen by the service creator. This way, users have the ability to decide who can operate over its services. It is important to note that only the service creator or "owner" can update it; this means that allowed users will only be able to see the service (and its buckets) and make invocations.

This allowed users list is defined on the FDL on the service creation (more info in link FDL doc). The following is an example of an FDL that creates a service that gives access to two EGI users. If the allowed_users field is empty, the service is treated as "public," so every user within the cluster will have access to it.
This allowed users list is defined on the FDL on the service creation (more info in [FDL docs](fdl.md)). The following is an example of an FDL that creates a service that gives access to two EGI users. If the allowed_users field is empty, the service is treated as "public," so every user within the cluster will have access to it.

At the moment of the creation, the UID of the user creating the service doesn't need to be present on the list; however, when a service is updated, if the user will still have access, its UID has to be on the list.


``` yaml
functions:
oscar:
Expand All @@ -38,5 +39,4 @@ functions:
> **_NOTE:_** A user can obtain its EGI User Id by login into https://aai.egi.eu/ (for the production instance of EGI Check-In) or https://aai-demo.egi.eu (for the demo instance of EGI Check-In).
Since OSCAR uses MinIO as the main storage provider, so that the users only have access to their designated bucket's service, MinIO users are created on the fly for each EGI UID. Consequently, each user accessing the cluster will have a MinIO user with its UID as AccessKey and an autogenerated SecretKey.
Since OSCAR uses MinIO as the main storage provider, so that the users only have access to their designated bucket's service, MinIO users are created on the fly for each EGI UID. Consequently, each user accessing the cluster will have a MinIO user with its UID as AccessKey and an autogenerated SecretKey.
1 change: 1 addition & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ processing of files. This script must use the environment variables
`INPUT_FILE_PATH` and `TMP_OUTPUT_DIR` to refer to the input file and the
folder where to save the results respectively:


```
#!/bin/bash
Expand Down
Loading

0 comments on commit a50bb09

Please sign in to comment.