diff --git a/.DS_Store b/.DS_Store new file mode 100644 index 0000000..7abfdec Binary files /dev/null and b/.DS_Store differ diff --git a/README.md b/README.md index 22f87da..f35f202 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,81 @@ -# Red Hat OpenShift Data Science Quick Course +# Serving LLM Models on OpenShift AI -This is the starter template for creating a new quick course in the **RedHatQuickCourses** GitHub organization. +This is an advanced course on Serving a Large Language Model (LLM) using OpenShift AI. This course is a lab walkthrough starting with OpenShift Container Cluster. You will need to install the Operators to successfully configuire OPenShift AI. Once operational, you will to add the Ollama Model Serving Runtime, create a Data Science Project, Deploy S3 compatible Storage, Setup Data Connections, create a workbench, Use the Single Serving Model Platform to host the Ollama framework, configuire a Mistral Model, then work through a Jupyter Notebook to test your models Performance. -After you create a new repository based on this template, you need to edit and change several files and replace placeholder text and links in them. + Creating Course Content -1. Add a _README_ at the root of your repository and leave instructions for people contributing to your quick course. Make sure you provide the link to the GitHub issues page for your course so that contributors and users can report issues and provide feedback about your course. +We use a system called Antora (https://antora.org) to publish courses. Antora expects the files and folders in a source repository to be arranged in a certain opinionated way to simplify the process of writing course content using asciidoc, and then converting the asciidoc source to HTML. -1. Edit the **antora.yml** file in the repository root. - * Change the _name_, _title_ and _version_ attributes - * Edit the list of items under the _nav_ attribute to add or remove new chapters/modules to the course. +Refer to the quick courses [contributor guide](https://redhatquickcourses.github.io/welcome/1/guide/overview.html) for a detailed guide on how to work with Antora tooling and publish courses. -1. Edit the antora-playbook.yml file in the repository root. - * Edit only the _title_ and _start_page_ attributes in this file. You may not be required to change the other attributes unless the need arises. +## TL;DR Quickstart -1. Edit the _supplemental-ui/partials/header-content.hbs_ file and change the link in the _navbar-item_ div to point to the GitHub issues page for your repository. +This section is intended as a quick start guide for technically experienced members. The contributor guide remains the canonical reference for the course content creation process with detailed explanations, commands, video demonstrations, and screenshots. -1. Edit the files and folders under the _modules_ folder to structure your course content into chapters/modules and sections. +### Pre-requisites -1. Take a brief look at the GitHub actions configuration in the _.github_ folder. It contains basic configuration to auto-generate HTML from the asciidoc source and render it using GitHub pages. Unless you know what you are doing with this, and have prior experience with GitHub actions workflows, do not change these files. +* You have a macOS or Linux workstation. Windows has not been tested, or supported. You can try using a WSL2 based environment to run these steps - YMMV! +* You have a somewhat recent version of the Git client installed on your workstation +* You have a somewhat new Node.js LTS release (Node.js 16+) installed locally. +* Install a recent version of Visual Studio Code. Other editors with asciidoc editing support may work - YMMV, and you are on your own... -## Problems and Feedback -If you run into any issues, report bugs/suggestions/improvements about this template here - https://github.com/RedHatQuickCourses/course-starter/issues +### Antora Files and Folder Structure + +The *antora.yml* file lists the chapters/modules/units that make up the course. + +Each chapter entry points to a *nav.adoc* file that lists the sections in that chapter. The home page of the course is rendered from *modules/ROOT/pages/index.adoc*. + +Each chapter lives in a separate folder under the *modules* directory. All asciidoc source files live under the *modules/CHAPTER/pages* folder. + +To create a new chapter in the course, create a new folder under *modules*. + +To add a new section under a chapter create an entry in the *modules/CHAPTER/nav.adoc* file and then create the asciidoc file in the *modules/CHAPTER/pages* folder. + +### Steps + +1. Clone or fork the course repository. +``` + $ git clone git@github.com:RedHatQuickCourses/llm-model-serving.git +``` + +2. Install the npm dependencies for the course tooling. +``` + $ cd llm-model-serving + $ npm install +``` + +3. Start the asciidoc to HTML compiler in the background. This command watches for changes to the asciidoc source content in the **modules** folder and automatically re-generates the HTML content. +``` + $ npm run watch:adoc +``` +4. Start a local web server to serve the generated HTML files. Navigate to the URL printed by this command to preview the generated HTML content in a web browser. +``` + $ npm run serve +``` + +5. Before you make any content changes, create a local Git branch based on the **main** branch. As a good practice, prefix the branch name with your GitHub ID. Use a suitable branch naming scheme that reflects the content you are creating or changing. +``` + $ git checkout -b your_GH_ID/ch01s01 +``` + +6. Make your changes to the asciidoc files. Preview the generated HTML and verify that there are no rendering errors.Commit your changes to the local Git branch and push the branch to GitHub. +``` + $ git add . + $ git commit -m "Added lecture content for chapter 1 section 1" + $ git push -u origin your_GH_ID/ch01s01 +``` + +7. Create a GitHub pull request (PR) for your changes using the GitHub web UI. For forks, create a PR that merges your forked changes into the `main` branch of this repository. + +8. Request a review of the PR from your technical peers and/or a member of the PTL team. + +9. Make any changes requested by the reviewer in the **same** branch as the PR, and then commit and push your changes to GitHub. If other team members have made changes to the PR, then do not forget to do a **git pull** before committing your changes. + +10. Once reviewer(s) approve your PR, you should merge it into the **main** branch. + +11. Wait for a few minutes while the automated GitHub action publishes your changes ot the production GitHub pages website. + +12. Verify that your changes have been published to the production GitHub pages website at https://redhatquickcourses.github.io/rhods-deploy + +# Problems and Feedback +If you run into any issues, report bugs/suggestions/improvements about this course here - https://github.com/RedHatQuickCourses/llm-model-serving/issues \ No newline at end of file diff --git a/antora-playbook.yml b/antora-playbook.yml index 7c1aa70..87dfd4c 100644 --- a/antora-playbook.yml +++ b/antora-playbook.yml @@ -1,6 +1,6 @@ site: - title: Placeholder Course Title - start_page: placeholder-course-name::index.adoc + title: Serving LLM Models on OpenShift AI + start_page: llm-model-serving::index.adoc content: sources: diff --git a/antora.yml b/antora.yml index 983a8be..280f425 100644 --- a/antora.yml +++ b/antora.yml @@ -1,6 +1,6 @@ -name: placeholder-course-name -title: Placeholder Course Title -version: 1 +name: llm-model-serving +title: Serving LLM Models on OpenShift AI +version: 1.01 nav: - modules/ROOT/nav.adoc - modules/chapter1/nav.adoc diff --git a/devfile.yaml b/devfile.yaml index 7ba192c..7bb004b 100644 --- a/devfile.yaml +++ b/devfile.yaml @@ -1,8 +1,8 @@ schemaVersion: 2.1.0 metadata: - name: rhods-quick-course - displayName: RHODS Quick Course - description: RHODS Quick Course published using Antora + name: llm-model-serving-quick-course + displayName: Serving LLM Models on OpenShift AI + description: LLM Model Serving Quick Course published using Antora icon: https://nodejs.org/static/images/logos/nodejs-new-pantone-black.svg tags: - Node.js @@ -12,10 +12,10 @@ metadata: language: JavaScript version: 2.1.1 starterProjects: - - name: rhods-quick-course + - name: llm-model-serving-quick-course git: remotes: - origin: 'https://github.com/RedHatTraining/rhods-quick-course.git' + origin: 'https://github.com/RedHatTraining/llm-model-serving-quick-course.git' components: - name: runtime container: diff --git a/modules/.DS_Store b/modules/.DS_Store new file mode 100644 index 0000000..c305f39 Binary files /dev/null and b/modules/.DS_Store differ diff --git a/modules/ROOT/pages/index copy.adoc b/modules/ROOT/pages/index copy.adoc new file mode 100644 index 0000000..b672fba --- /dev/null +++ b/modules/ROOT/pages/index copy.adoc @@ -0,0 +1,10 @@ += Serving LLM Models on OpenShift AI +:navtitle: Home + +== Introduction + +Welcome to this quick course on Serving LLM Models on Red Hat OpenShift AI: + +The objective is to experience the entire process of Serving the Mistral 7B Large Language Model, starting with a Openshift Container Cluster version 4.15. + +From this point, you will need to install the Operators to successfully configuire OpenShift AI. Once operational, you will to add the Ollama Model Serving Runtime, create a Data Science Project, Deploy S3 compatible Storage, Setup Data Connections, create a workbench, Use the Single Serving Model Platform to host the Ollama framework, configuire a Mistral Model, then work through a Jupyter Notebook to test your models Serving Performance. \ No newline at end of file diff --git a/modules/ROOT/pages/index.adoc b/modules/ROOT/pages/index.adoc index d4f67ea..4f28605 100644 --- a/modules/ROOT/pages/index.adoc +++ b/modules/ROOT/pages/index.adoc @@ -1,6 +1,61 @@ -= An Example Quick Course += Serving LLM Models on OpenShift AI :navtitle: Home -== Introduction +Welcome to this Quick course on _Deploying an LLM using OpenShift AI_. This is the first of a set of advanced courses about Red Hat OpenShift AI: -This is an example quick course demonstrating the usage of Antora for authoring and publishing quick courses. \ No newline at end of file +IMPORTANT: The hands-on labs in this course were created and tested with RHOAI v2.9.1. Labs should mostly work without any changes in minor dot release upgrades of the product. Please open issues in this repository if you face any issue. + + +== Authors + +The PTL team acknowledges the valuable contributions of the following Red Hat associates: + +*Christopher Nuland + +*Vijay Chebolu & Team + +*Karlos Knox + +== Classroom Environment + +This introductory course has a few, simple hands-on labs. You will use the Base RHOAI on AWS catalog item in the Red Hat Demo Platform (RHDP) to run the hands-on exercises in this course. + +This course will utlize the *Red Hat OpenShift Container Platform Cluster*. + +When ordering this catalog item in RHDP: + + * Select Practice/Enablement for the Activity field + + * Select Learning about the Product for the Purpose field + + * Enter Learning RHOAI in the Salesforce ID field + + * Scroll to the bottom, check the box to confirm acceptance of terms and conditions + + * Click order + +For Red Hat partners who do not have access to RHDP, provision an environment using the Red Hat Hybrid Cloud Console. Unfortunately, the labs will NOT work on the trial sandbox environment. You need to provision an OpenShift AI cluster on-premises, or in the supported cloud environments by following the product documentation at Product Documentation for Red Hat OpenShift AI 2024. + +== Prerequisites + +For this course, basic experience with Red Hat OpenShift is recommended but is not mandatory. + +You will encounter & modify code segments, deploy resources using Yaml files, and have to modify launch configurations, but you will not have to write code. + +== Objectives + +The overall objectives of this introductory course include: + + * Familiarize utilizing Red Hat OpenShift AI to Serve & Interact with an LLM. + + * Installing Red Hat OpenShift AI Operator & Dependencies + + * Add a custom Model Serving Runtime + + * Create a data science project, workbench & data connections + + * Load an LLM model into the Ollama Runtime Framework + + * Import (from Git repositories), interact with LLM model via a Jupyter Notebook + + * Experiment with the Mistral LLM \ No newline at end of file diff --git a/modules/appendix/pages/appendix.adoc b/modules/appendix/pages/appendix.adoc index ee38df2..94d4c08 100644 --- a/modules/appendix/pages/appendix.adoc +++ b/modules/appendix/pages/appendix.adoc @@ -1,3 +1,3 @@ = Appendix A -Content for Appendix A... \ No newline at end of file +Content for Appendix A... +D \ No newline at end of file diff --git a/modules/chapter1/images/redhatllm.gif b/modules/chapter1/images/redhatllm.gif new file mode 100644 index 0000000..0fa6ad0 Binary files /dev/null and b/modules/chapter1/images/redhatllm.gif differ diff --git a/modules/chapter1/nav.adoc b/modules/chapter1/nav.adoc index 8dcc0c4..f74eadc 100644 --- a/modules/chapter1/nav.adoc +++ b/modules/chapter1/nav.adoc @@ -1,4 +1 @@ -* xref:index.adoc[] -** xref:section1.adoc[] -** xref:section2.adoc[] -** xref:section3.adoc[] \ No newline at end of file +* xref:index.adoc[] \ No newline at end of file diff --git a/modules/chapter1/pages/index.adoc b/modules/chapter1/pages/index.adoc index 476fc0f..fdf86d9 100644 --- a/modules/chapter1/pages/index.adoc +++ b/modules/chapter1/pages/index.adoc @@ -1,3 +1,42 @@ -= Chapter 1 += Technical side of LLMs + + +[NOTE] +This segment of the course provides context to know & analogies to guide us to comprehend the purpose of guided lab in the next section. Feel free to skip ahead if you just want to get started. + +=== Why this technical course ? + +Previously, read a post on LinkenIn and felt it summed up the why quite nicely. + +It described the basic idea that a Formula One Driver doesn't need to know the how to build an engine to be an F1 champion. However, she/he needs to have a *mechanical sympathy* which is understanding of car's mechanics to drive it effectively and get the best out it. + +The same applies to AI, we don't need to be AI experts to harness the power of large language models but we to develop a certain level of "mechanical sympathy" with how these Models are Selected, Operationized, Served, Infered from, and kept up to date, to work with AI in harmony. Not just as users, but as collaborators who understand the underlying mechanics to communicate with clients, partners, and co-workers effectively. + +It's not just about the Model itself, it's about the platform that empowers us to create trushtworthy AI applications and guides us in making informed choices. + +The true power lies in the platform that enables us to harness a diverse range of AI models, tools, infrastructure and operationalize our ML projects. + +That platform, *OpenShift AI* is what we learn to create, configure, and utilize to Serve LLM Models in this quick course. + + +=== The Ollama Model Framework + +LLMs - Large Language Models (LLMs) can generate new stories, summarize texts, and even performing advanced tasks like reasoning and problem solving, which is not only impressive but also remarkable due to their accessibility and easy integration into applications. + +There are a lot of popular LLMs, Nonetheless, their operation remains the same: users provide instructions or tasks in natural language, and the LLM generates a response based on what the model "thinks" could be the continuation of the prompt. + +Ollama is not an LLM Model - Ollama is a relatively new but powerful open-source framework designed for serving machine learning models. It's designed to be efficient, scalable, and easy to use, making it an attractive option for developers and organizations looking to deploy their AI models into production. + +==== How does Ollama work? + + +*At its core, Ollama simplifies the process of downloading, installing, and interacting with a wide range of LLMs, empowering users to explore their capabilities without the need for extensive technical expertise or reliance on cloud-based platforms. + +In this course, we will focus on single LLM, Mistral. However, with the understanding of the Ollama Framework, we will be able to work with a variety of large language models utilizing the exact same configuration. + +You be able to switch models in minutes, all running on the same platform. This will enable you test, compare, and evalute multiple models with the skills gained in the course. + +*Experimentation and Learning* + +Ollama provides a powerful platform for experimentation and learning, allowing users to explore the capabilities and limitations of different LLMs, understand their strengths and weaknesses, and develop skills in prompt engineering and LLM interaction. This hands-on approach fosters a deeper understanding of AI technology and empowers users to push the boundaries of what’s possible.* -This is the home page of _Chapter_ 1 in the *hello* quick course... \ No newline at end of file diff --git a/modules/chapter1/pages/section1.adoc b/modules/chapter1/pages/section1.adoc index 6b6277c..2795151 100644 --- a/modules/chapter1/pages/section1.adoc +++ b/modules/chapter1/pages/section1.adoc @@ -1,3 +1,2 @@ -= Section 1 += Follow up Story -This is _Section 1_ of _Chapter 1_ in the *hello* quick course.... \ No newline at end of file diff --git a/modules/chapter1/pages/section2.adoc b/modules/chapter1/pages/section2.adoc index 83a976c..2795151 100644 --- a/modules/chapter1/pages/section2.adoc +++ b/modules/chapter1/pages/section2.adoc @@ -1,3 +1,2 @@ -= Section 2 += Follow up Story -This is _Section 2_ of _Chapter 1_ in the *hello* quick course.... \ No newline at end of file diff --git a/modules/chapter1/pages/section3.adoc b/modules/chapter1/pages/section3.adoc deleted file mode 100644 index 17a0eda..0000000 --- a/modules/chapter1/pages/section3.adoc +++ /dev/null @@ -1,4 +0,0 @@ -= Section 3 - -This is _Section 3_ of _Chapter 1_ in the *hello* quick course.... - diff --git a/modules/chapter2/.DS_Store b/modules/chapter2/.DS_Store new file mode 100644 index 0000000..5c0b0cd Binary files /dev/null and b/modules/chapter2/.DS_Store differ diff --git a/modules/chapter2/images/redhatllm.gif b/modules/chapter2/images/redhatllm.gif new file mode 100644 index 0000000..0fa6ad0 Binary files /dev/null and b/modules/chapter2/images/redhatllm.gif differ diff --git a/modules/chapter2/nav.adoc b/modules/chapter2/nav.adoc index 62b8058..d6da93d 100644 --- a/modules/chapter2/nav.adoc +++ b/modules/chapter2/nav.adoc @@ -1,2 +1,3 @@ * xref:index.adoc[] -** xref:section1.adoc[] \ No newline at end of file +** xref:section1.adoc[] +** xref:section2.adoc[] \ No newline at end of file diff --git a/modules/chapter2/pages/index.adoc b/modules/chapter2/pages/index.adoc index a5dd78f..3837e68 100644 --- a/modules/chapter2/pages/index.adoc +++ b/modules/chapter2/pages/index.adoc @@ -1,3 +1,50 @@ -= Chapter 2 += OpenShift AI Initilization -This is the home page of _Chapter 2_ in the *hello* quick course.... \ No newline at end of file +== Supported configurations +OpenShift AI is supported in two configurations: + + * A managed cloud service add-on for *Red Hat OpenShift Dedicated* (with a Customer Cloud Subscription for AWS or GCP) or for Red Hat OpenShift Service on Amazon Web Services (ROSA). +For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service 1] + +* Self-managed software that you can install on-premise or on the public cloud in a self-managed environment, such as *OpenShift Container Platform*. +For information about OpenShift AI as self-managed software on your OpenShift cluster in a connected or a disconnected environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8[Product Documentation for Red Hat OpenShift AI Self-Managed 2.8] + +In this course we cover installation of *Red Hat OpenShift AI self-managed* using the OpenShift Web Console. + +== General Information about Installation + + +[INFO] +==== +The product name has been recently changed to *Red{nbsp}Hat OpenShift AI (RHOAI)* (old name *Red{nbsp}Hat OpenShift Data Science*). In this course, most references to the product use the new name. However, references to some UI elements might still use the previous name. +==== + +In addition to the *Red{nbsp}Hat OpenShift AI* Operator there are some other operators that you may need to install depending on which features and components of *Red{nbsp}Hat OpenShift AI* you want to install and use. + + +https://www.redhat.com/en/technologies/cloud-computing/openshift/pipelines[Red{nbsp}Hat OpenShift Pipelines Operator]:: +The *Red{nbsp}Hat OpenShift Pipelines Operator* is required if you want to install the *Red{nbsp}Hat OpenShift AI Pipelines* component. + + +[NOTE] +==== +To support the KServe component, which is used by the single-model serving platform to serve large models, install the Operators for Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh. +==== + +https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html[OpenShift Serveless Operator]:: +The *OpenShift Serveless Operator* is a prerequisite for the *Single Model Serving Platform*. + +https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html[OpenShift Service Mesh Operator]:: +The *OpenShift Service Mesh Operator* is a prerequisite for the *Single Model Serving Platform*. + + +[NOTE] +==== +The following Operators are required to support the use of Nvidia GPUs (accelerators) with OpenShift AI +==== + +https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html[Node Feature Discovery Operator]:: +The *Node Feature Discovery Operator* is a prerequisite for the *NVIDIA GPU Operator*. + +https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html[NVIDIA GPU Operator]:: +The *NVIDIA GPU Operator* is required for GPU support in Red Hat OpenShift AI. \ No newline at end of file diff --git a/modules/chapter2/pages/section1.adoc b/modules/chapter2/pages/section1.adoc index 8d7d234..612c790 100644 --- a/modules/chapter2/pages/section1.adoc +++ b/modules/chapter2/pages/section1.adoc @@ -1,3 +1,43 @@ -= Section 1 += Installing Red{nbsp}Hat OpenShift AI Using the Web Console -This is _Section 1_ of _Chapter 2_ in the *hello* quick course.... \ No newline at end of file +*Red{nbsp}Hat OpenShift AI* is available as an operator via the OpenShift Operator Hub. You will install the *Red{nbsp}Hat OpenShift AI operator* and dependencies using the OpenShift web console in this section. + +== Lab: Installation of Red{nbsp}Hat OpenShift AI + +IMPORTANT: The installation requires a user with the _cluster-admin_ role + +. Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned. + +. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators + +[*] You do not have to wait for the previous Operator to complete before installing the next. For this lab you can skip the installation of the optional operators as there is no GPU. + + * Web Terminal + + * Red Hat OpenShift Serverless + + * Red Hat OpenShift Service Mesh + + * Red Hat OpenShift Pipelines + + * GPU Support + + ** Node Feature Discovery Operator (optional) + + ** NVIDIA GPU Operator (optional) + +[TIP] + + Installing these Operators prior to the installation of the OpenShift AI Operator in my experience has made a difference in OpenShift AI acknowledging the availability of these components and adjusting the initial configuration to shift management of these components to OpenShift AI. + +. Navigate to **Operators** -> **OperatorHub** and search for *OpenShift AI*. + +. Click on the `Red{nbsp}Hat OpenShift AI` operator. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. ++ + +. In the `Install Operator` page, leave all of the options as default and click on the *Install* button to start the installation. + +. The operator Installation progress window will pop up. The installation may take a couple of minutes. + + +WARNING: Do proceed with the installation past this point. In order to access the LLM remotely; There will be some modifcations to the Data Science Cluster YAML file prior to completing the installation of Red Hat OpenShift AI. \ No newline at end of file diff --git a/modules/chapter2/pages/section2.adoc b/modules/chapter2/pages/section2.adoc new file mode 100644 index 0000000..2bfabb8 --- /dev/null +++ b/modules/chapter2/pages/section2.adoc @@ -0,0 +1,101 @@ += Modifying the OpenShift AI TLS Certificate + +[NOTE] + +An SSL/TLS certificate is a digital object that allows systems to verify the identity & subsequently establish an encrypted network connection to another system using the Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol. + +By default, the Single Model Serving Platform in Openshift AI uses a self-signed certificate generated at installation for the endpoints that are created when deploying a Model server. + +This can be counter-intuitive because the OCP Cluster already has certificates configured which will be used by default for endpoints like Routes. + +This following procedure explains how to use the same certificate from the OpenShift Container cluster for OpenShift AI. + +== Use OpenShift Certificates for Ingress Routes + +[NOTE] +Most customers will not use the self-signed certificates, opting instead to use certificates generated by their own authority. Therefore this step of adding secrets to OpenShift & OpenShift AI is common process during installation. + +=== Navigate to the OpenShift Container Cluster Dashboard + +The content of the Secret (data) should contain two items, *tls.cert* and *tls.key*. They are the certificate and key that are used for all the OpenShift Routes. + +*Collect the Secret YAML Text:* + + . In the Navigation pane on the left, click on the *Workloads* section, then *Secrets* under Workloads. + . From the Project dropdown, toggle the *show default projects* radial button to on. + . Select the *openshift-ingress* project from the list. + . Locate the file named *ingress-certs-(XX-XX-2024)*, type should be *Opaque* + . Click on the filename to open the secret, Select the *YAML Tab* + . Copy all the text from the window, insure you scroll down. (CTL-A should work). + +*Clean & Deploy the Secret YAML Text:* + + * Click on the Project dropdown again, select the *Istio-system* project + * Select the *Create* button on the right, then Select the *from YAML* option. + * Delete the text from the Window, and *paste the copied Secret text* + + * Cleanup the YAML Text to just keep the relevant content. It should look like the below YAML file (the name of the secret will be different, it's normally tied to the date the RDHP Cluster was deployed). Change the type to *kubernetes.io/tls*. + +```yaml +kind: Secret +apiVersion: v1 +metadata: +name: ingress-certs-05-28-2024 +data: +tls.crt: >- + LS0tLS1CRUd... +tls.key: >- + LS0tLS1CRUd... +type: kubernetes.io/tls +``` + +* Copy the Name in red portion of the text (optional, but helpful) +* Click *create* to apply this YAML into the istio-system proejct (namespace). + +*We have copied the Secret used by OCP & made it available be used by OAI.* + + + + +== Create OpenShift AI Data Science Cluster + +With our secrets in place, the next step is to create OpenShift AI *Data Science Cluster*. + +Return to the OpenShift Navigation Menu, Select Installed Operators, and Click on the OpenShift AI Operator name to open the operator. + + . *Select the Option to create a Data Science Cluster.* + + . *Select the radial button to switch to the YAML view.* + + . Find the section below in the YAML file, in the Kserve Section find the Serving/Certificate area; add the line: *secretName:* followed by the name of the secret name that we deployed in the istio-system project. In addition, change the type from SelfSigned to *Provided*. See below for the example. + +```yaml +kserve: +devFlags: {} +managementState: Managed +serving: + ingressGateway: + certificate: + secretName: ingress-certs-XX-XX-2024 + type: Provided + managementState: Managed + name: knative-serving +``` + +Once you have made those changes to the YAML file, *Click Create* to Deploy the Data Science Cluster. + +Single Model Serve Platform will now be deployed / expose ingress connections with the same certificate as OpenShift Routes. Endpoints will be accessible using TLS without having to ignore error messages or create special configurations. + +== Epilogue + +Congradulations, you have successful completed the installation of OpenShift AI on an OpenShift Container Cluster. OpenShift AI is now running as new Dashboard! + + + * We Installed the required OpenShift AI Operators + ** Serverless, ServiceMesh, & Pipelines Operators + ** OpenShift AI Operator + ** Web Terminal Operator + +Additionally, we took this installation a step further by sharing TLS certificates from the OpenShift Cluster with OpenShift AI. + +We pick up working OpenShift AI UI in the next Chapter. \ No newline at end of file diff --git a/modules/chapter3/pages/index.adoc b/modules/chapter3/pages/index.adoc index 785a3fe..346aa27 100644 --- a/modules/chapter3/pages/index.adoc +++ b/modules/chapter3/pages/index.adoc @@ -1,3 +1,10 @@ -= Chapter 3 += OpenShift AI Configuration -This is the home page of _Chapter 3_ in the *hello* quick course.... \ No newline at end of file +This chapter begins with running & configured OpenShift AI environment, if you don't already have your environment running, head over to Chapter 2. + +Lots to cover in section 1, we add the Ollama custom Runtime, Create a Data Science Project, Setup Storage, Create a Workbench, and finally serving the Ollama Framework, utilizing the Single Model Serving Platform to deliver our model to our Notebook Application. + + +In section 2 we will explore using the Jupyter Notebook from our workbench, infere data from the Mistral 7B LLM. While less technical than previous section of this hands on course, there are some steps download the Mistral Model, updating our notebook with inference endpoint, and evaluating our Models performance. + +Let's get started --- \ No newline at end of file diff --git a/modules/chapter3/pages/section1.adoc b/modules/chapter3/pages/section1.adoc index 49a1e14..708596c 100644 --- a/modules/chapter3/pages/section1.adoc +++ b/modules/chapter3/pages/section1.adoc @@ -1,3 +1,370 @@ -= Section 1 += OpenShift AI Customization + +== Model Serving Runtimes + +A model-serving runtime provides integration with a specified model server and the model frameworks that it supports. By default, Red Hat OpenShift AI includes the following Model RunTimes: + + * OpenVINO Model Server runtime. + * Caikit TGIS for KServe + * TGIS Standalong for KServe + +However, if these runtime do not meet your needs (it doesn’t support a particular model framework, for example), you might want to add your own custom runtimes. + +As an administrator, you can use the OpenShift AI interface to add and enable custom model-serving runtimes. You can then choose from your enabled runtimes when you create a new model server. + + +This exercise will guide you through the broad steps necessary to deploy a custom Serving Runtime in order to serve a model using the Ollama Model Serving Framework. + +[NOTE] +==== +While RHOAI supports the ability to add your own runtime, it is up to you to configure, adjust and maintain your custom runtimes. +==== + +== Add The Ollama Custom Runtime + +. Log in to RHOAI with a user who is part of the RHOAI admin group, for this lab we will be using the admin account. + +. In the RHOAI Console, Navigate to the Settings menu, then Serving Runtimes + +. Select the Add Serving Runtime button: + +. For the model serving platform runtime *Select: Single-Model Serving Platform.* + +. For API protocol this runtime supports *Select: REST* + +. Click on Start from scratch in the window that opens up, paste the following YAML: ++ +```yaml +apiVersion: serving.kserve.io/v1alpha1 +kind: ServingRuntime +labels: + opendatahub.io/dashboard: "true" +metadata: + annotations: + openshift.io/display-name: Ollama + name: ollama +spec: + builtInAdapter: + modelLoadingTimeoutMillis: 90000 + containers: + - image: quay.io/rh-aiservices-bu/ollama-ubi9:0.1.30 + env: + - name: OLLAMA_MODELS + value: /.ollama/models + - name: OLLAMA_HOST + value: 0.0.0.0 + - name: OLLAMA_KEEP_ALIVE + value: '-1m' + name: kserve-container + ports: + - containerPort: 11434 + name: http1 + protocol: TCP + multiModel: false + supportedModelFormats: + - autoSelect: true + name: any +``` + +. After clicking the **Add** button at the bottom of the input area, we are see the new Ollama Runtime in the list. We can re-order the list as needed (the order chosen here is the order in which the users will see these choices) + + +== Create a Data Science Project + +Navigate to & select the Data Science Projects section. + + . Select the create data science project button + + . Enter a name for your project, such as *ollama-model*. + + . The resource name should be populated automatically + + . Optionally add a description to the data science project + + . Select Create + + +== Deploy MinIO as S3 Compatible Storage + +=== MinIO overview + +*MinIO* is a high-performance, S3 compatible object store. It can be deployed on a wide variety of platforms, and it comes in multiple flavors. + +This segment describes a very quick way of deploying the community version of MinIO in order to quickly setup a fully standalone Object Store, in an OpenShift Cluster. This can then be used for various prototyping tasks that require Object Storage. + +[WARNING] +This version of MinIO should not be used in production-grade environments. Also, MinIO is not included in RHOAI, and Red Hat does not provide support for MinIO. + +=== MinIO Deployment +To Deploy MinIO, we will utilize the OpenShift Dashboard. + + . Click on the Project Selection list dropdown, Select the Ollama-Model project or the data science project you created in the previous step. + + . Then Select the + (plus) icon from the top right of the dashboard. + + . In the new window, we will paste the following YAML file. In the YAML below its recommended to change the default user name & password. + + +```yaml +--- +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: minio-pvc +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 40Gi + volumeMode: Filesystem +--- +kind: Secret +apiVersion: v1 +metadata: + name: minio-secret +stringData: + # change the username and password to your own values. + # ensure that the user is at least 3 characters long and the password at least 8 + minio_root_user: minio + minio_root_password: minio123 +--- +kind: Deployment +apiVersion: apps/v1 +metadata: + name: minio +spec: + replicas: 1 + selector: + matchLabels: + app: minio + template: + metadata: + creationTimestamp: null + labels: + app: minio + spec: + volumes: + - name: data + persistentVolumeClaim: + claimName: minio-pvc + containers: + - resources: + limits: + cpu: 250m + memory: 1Gi + requests: + cpu: 20m + memory: 100Mi + readinessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 5 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + terminationMessagePath: /dev/termination-log + name: minio + livenessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 30 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + env: + - name: MINIO_ROOT_USER + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_user + - name: MINIO_ROOT_PASSWORD + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_password + ports: + - containerPort: 9000 + protocol: TCP + - containerPort: 9090 + protocol: TCP + imagePullPolicy: IfNotPresent + volumeMounts: + - name: data + mountPath: /data + subPath: minio + terminationMessagePolicy: File + image: >- + quay.io/minio/minio:RELEASE.2023-06-19T19-52-50Z + args: + - server + - /data + - --console-address + - :9090 + restartPolicy: Always + terminationGracePeriodSeconds: 30 + dnsPolicy: ClusterFirst + securityContext: {} + schedulerName: default-scheduler + strategy: + type: Recreate + revisionHistoryLimit: 10 + progressDeadlineSeconds: 600 +--- +kind: Service +apiVersion: v1 +metadata: + name: minio-service +spec: + ipFamilies: + - IPv4 + ports: + - name: api + protocol: TCP + port: 9000 + targetPort: 9000 + - name: ui + protocol: TCP + port: 9090 + targetPort: 9090 + internalTrafficPolicy: Cluster + type: ClusterIP + ipFamilyPolicy: SingleStack + sessionAffinity: None + selector: + app: minio +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-api +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: api + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-ui +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: ui + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect +``` + +*This should finish in a few seconds. Now it's time to deploy our storage buckets.* + + +=== MinIO Storage Bucket Creation + +From the OCP Dashboard: + + . Select Networking / Routes from the navigation menu. + + . This will display two routes, one for the UI & another for the API. + + . For the first step select the UI route, and paste it in a browser Window. + + . This window opens the MinIO Dashboard, login with user/password combination you set, or the default listed in yaml file above. + +Once logged into the MinIO Console: + + . Click Create Bucket to get started. + + . Create two Buckets: + + .. *models* + + .. *storage* + +[NOTE] + When serving a LLM or other model Openshift AI looks within a Folder, therefore we need at least one subdirectory under the Models Folder. + + . Via the Navigation menu, *Select object browser*, Click on the Model Bucket. + . From the models bucket page, click add path, and type *ollama* as the name of the sub-Folder or path. + +[IMPORTANT] +In most cases to serve a model, the trained model would be uploaded into this sub-directory. *Ollama is a special case, as it can download and manage Several LLM models as part of the runtime.* + + . We still need a file available in this folder for the model deployment workflow to succeed. + + . So we will copy an emptyfile.txt file to the ollama subdirectory. You can download the file from https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/serving-runtimes/ollama_runtime[*this location*]. Or you can create your own file called emptyfile.txt and upload it. + + . Once you have this file ready, upload it into the Ollama path in the model bucket, by clicking the upload button and selecting the file from your local desktop. + +=== Create Data Connection + +Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. + +. Select the Data Connection menu, followed by create data connection +. Provide the following values: +.. Name: *models* +.. Access Key: is the minio_root-user from YAML file +.. Secret Key: is the minio_root_password from the YAML File +.. Endpoint: is the Minio API URL from the Routes page in Openshift Dashboard +.. Region: Is required for AWS storage & cannot be blank (no-region-minio) +.. Bucket: is the Minio Storage bucket name: *models* + +Repeat for the Storage bucket, using *storage* for the name & bucket. + + +== Creating a WorkBench + +Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. + + . Select the WorkBench button, then create workbench + + .. Name: `ollama-model` + + .. Notebook Image: `Minimal Python` + + .. Leave the remianing options default + + .. Optionally, scroll to the bottom, check the `Use data connection box` + + .. Select *storage* from the dropdown to attach the storage bucket to the workbench. + + . Select the Create Workbench option. + +[NOTE] +Depending on the notebook image selected, it can take between 2-20 minutes for the container image to be fully deployed. The Open Link will be available when our container is fully deployed. + + +== Creating The Model Server + +From the ollama-model WorkBench Dashboard in the ollama-model project, specify the **Models** section, and select Deploy Model from the **Single Model Serving Platform Button**. + + +*Create the model server with the following values:* + +-- + .. Model name: `Ollama-Mistral` + .. Serving Runtime: `Ollama` + .. Model framework: `Any` + .. Model Server Size: `Medium` + .. Model location data connection: `models` + .. Model location path: `/ollama` + + +After clicking the **Deploy** button at the bottom of the form, the model is added to our **Models & Model Server list**. When the model is avialable the inference endpoint will populate & the status will indicate a green checkmark. + +We are now ready to interact with our newly deployed LLM Model. Join me in Section 2 to explore Mistral running on OpenShift AI using Jupyter Notebooks. + -This is _Section 1_ of _Chapter 3_ in the *hello* quick course.... \ No newline at end of file diff --git a/modules/chapter3/pages/section2.adoc b/modules/chapter3/pages/section2.adoc index 24f5686..c2603fb 100644 --- a/modules/chapter3/pages/section2.adoc +++ b/modules/chapter3/pages/section2.adoc @@ -1,3 +1,126 @@ -= Section 2 += Jupyter Notebooks & Mistral LLM Model Setup + +== Open the Jupyter Notebook + +From the OpenShift AI ollama-model workbench dashboard, + +* Select the Open link to the right of the status section; When the new window opens, use the OpenShift admin user & password to login to the Notebook. + +click *Allow selected permissions* button to complete login to the notebook. + +[NOTE] +If the *OPEN* link for the notebook is grayed out, the notebook container is still starting, this process can take a few minutes & up to 20+ minutes depending on the notebook image we opt'd to choose. + + +== Inside the Jupyter Notebook + +Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git + +Navigate to the llm-on-openshift/examples/notebooks/langchain folder: + +Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_ + +Explore the notebook then continue. + +=== Update the Inference Endpoint + +Head back to the RHOAI workbench dashboard & copy the interence endpoint from our ollama-mistral model. + +Return the Jupyter Notebook Environment, + + . Paste the inference endpoint into the Cell labeled interfence_server_url = *"replace with your own inference address"* + + . We can now start executing the code in the cells, starting with the set the inference server url cell. + + . Next we run the second cell: !pip install -q langchain==0.1.14 ; there is a notice to update pip, just ignore and continue. + + . The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model. + + . The fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI. + +[WARNING] +Before we continue we need to perform the following additional step. As mentioned, The Ollama Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step. + +== Activating the Mistral Model in Ollama + +We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there. + +[source, yaml] +---- +curl https://your-endpoint/api/pull \ + -k \ + -H "Content-Type: application/json" \ + -d '{"name": "mistral"}' +---- + + . Next copy the entire code snippet, and open the OpenShift Dashboard. + . At the top right of the dashboard, locate the ">_" and select it. + . This will open the terminal window at the bottom of the dashboard. + . Click on the Start button in the terminal window, wait for the bash..$ prompt to appear + . Past the modified code block into the window and press enter. + +The message: *status: pulling manifest* should appear. This begins the model downloading process. + +Once the download completes, the *status: success:* message appears. We can now return to the Jupyter Notebook Tab in the browser and proceed. + +=== Create the Prompt + +This cell sets the *system message* portion of the query to our model. Normally we don't get the see this part of the query. This message details how the model should act / respond / and consider our questions. This adds checks to valdiate the information is best as possible, and to explain answers in detail. + +== Memory for the conversation + +Keeps track of the conversation, this way history of the chat are also sent along with new chat information keeping the context for future questions. + +The next cell tracks the conversation and prints it to the Notebook output window so we can experience the full conversation list. + +=== First input to our LLM + +The Notebooks first input to our model askes it to describe Paris in 100 words or less. + +In green text is the window is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question. + +It takes ~12 seconds for the model to respong with the first word of the reply, and the final word is printed to the screen ~30 seconds after the request was started. + +The responce answered the question in a well considered and informated paragraph that less than 100 words in length + +=== Second Input + +Notice that the Second input - Is there a River, does not specify where the location is that might have a River. because the conversation history is passed with the second input, there is not need to specify any additional informaiton. + +The total time to first word took ~14 seconds this time, just a bit longr due the orginal information being sent. The time for the entire reponse to be printed to the screen just took over 4 seoncds. + +Overall our Model is performing well without a GPU and in a container limited to 4 cpus & 10Gb of memory. + +== Second Example Prompt + +Similar to the previous example, except we use the City of London, and run a cell to remove the verbose text reguarding what is sent or recieved apart from the another from model. + +There is no change to memory setting, but go ahead and evalute where the second input; is there a river is answer correctly. + +== Experimentation with Model + +Add a few new cells to the Notebooks + +Experiment with clearing the memory statement, then asking the river quetsion again. Or perhaps copy one of the input statements and add your own question for the model. + +Try not clearing the memory and asking a few questions. + +You have successfully deployed a Large Language Model, now test the information that it has available and find out what is doesn't know. + + +== Delete the Environment + +Once you finished experimenting with questions, make you head back to the Red Hat Demo Platform and delete the Openshift Container Platform Cluster. + +You don't have to remove any of the resources, deleting the environment will remove any resources created during this lesson. + +=== Leave Feedback + +If you enjoyed this walkthrough, please sent the team a note. +If you have suggestions to make it better or clarify a point, please send the team a note. + +Until the next time, Keep being Awesome! + + + -This is _Section 2_ of _Chapter 3_ in the *hello* quick course.... \ No newline at end of file diff --git a/modules/chapter4/chapter1.1/.DS_Store b/modules/chapter4/chapter1.1/.DS_Store new file mode 100644 index 0000000..3e9d4c1 Binary files /dev/null and b/modules/chapter4/chapter1.1/.DS_Store differ diff --git a/modules/chapter4/chapter1.1/images/redhatllm.gif b/modules/chapter4/chapter1.1/images/redhatllm.gif new file mode 100644 index 0000000..0fa6ad0 Binary files /dev/null and b/modules/chapter4/chapter1.1/images/redhatllm.gif differ diff --git a/modules/chapter4/chapter1.1/nav.adoc b/modules/chapter4/chapter1.1/nav.adoc new file mode 100644 index 0000000..d6da93d --- /dev/null +++ b/modules/chapter4/chapter1.1/nav.adoc @@ -0,0 +1,3 @@ +* xref:index.adoc[] +** xref:section1.adoc[] +** xref:section2.adoc[] \ No newline at end of file diff --git a/modules/chapter4/chapter1.1/pages/index.adoc b/modules/chapter4/chapter1.1/pages/index.adoc new file mode 100644 index 0000000..beb406f --- /dev/null +++ b/modules/chapter4/chapter1.1/pages/index.adoc @@ -0,0 +1,50 @@ += Chapter 1 + +== Supported configurations +OpenShift AI is supported in two configurations: + + * A managed cloud service add-on for *Red Hat OpenShift Dedicated* (with a Customer Cloud Subscription for AWS or GCP) or for Red Hat OpenShift Service on Amazon Web Services (ROSA). +For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service 1] + +* Self-managed software that you can install on-premise or on the public cloud in a self-managed environment, such as *OpenShift Container Platform*. +For information about OpenShift AI as self-managed software on your OpenShift cluster in a connected or a disconnected environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8[Product Documentation for Red Hat OpenShift AI Self-Managed 2.8] + +In this course we cover installation of *Red Hat OpenShift AI self-managed* using the OpenShift Web Console. + +== General Information about Installation + + +[INFO] +==== +The product name has been recently changed to *Red{nbsp}Hat OpenShift AI (RHOAI)* (old name *Red{nbsp}Hat OpenShift Data Science*). In this course, most references to the product use the new name. However, references to some UI elements might still use the previous name. +==== + +In addition to the *Red{nbsp}Hat OpenShift AI* Operator there are some other operators that you may need to install depending on which features and components of *Red{nbsp}Hat OpenShift AI* you want to install and use. + + +https://www.redhat.com/en/technologies/cloud-computing/openshift/pipelines[Red{nbsp}Hat OpenShift Pipelines Operator]:: +The *Red{nbsp}Hat OpenShift Pipelines Operator* is required if you want to install the *Red{nbsp}Hat OpenShift AI Pipelines* component. + + +[NOTE] +==== +To support the KServe component, which is used by the single-model serving platform to serve large models, install the Operators for Red Hat OpenShift Serverless and Red Hat OpenShift Service Mesh. +==== + +https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html[OpenShift Serveless Operator]:: +The *OpenShift Serveless Operator* is a prerequisite for the *Single Model Serving Platform*. + +https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html[OpenShift Service Mesh Operator]:: +The *OpenShift Service Mesh Operator* is a prerequisite for the *NSingle Model Serving Platform*. + + +[NOTE] +==== +The following Operators are required to support the use of Nvidia GPUs (accelerators) with OpenShift AI +==== + +https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html[Node Feature Discovery Operator]:: +The *Node Feature Discovery Operator* is a prerequisite for the *NVIDIA GPU Operator*. + +https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html[NVIDIA GPU Operator]:: +The *NVIDIA GPU Operator* is required for GPU support in Red Hat OpenShift AI. \ No newline at end of file diff --git a/modules/chapter4/chapter1.1/pages/section1.adoc b/modules/chapter4/chapter1.1/pages/section1.adoc new file mode 100644 index 0000000..612c790 --- /dev/null +++ b/modules/chapter4/chapter1.1/pages/section1.adoc @@ -0,0 +1,43 @@ += Installing Red{nbsp}Hat OpenShift AI Using the Web Console + +*Red{nbsp}Hat OpenShift AI* is available as an operator via the OpenShift Operator Hub. You will install the *Red{nbsp}Hat OpenShift AI operator* and dependencies using the OpenShift web console in this section. + +== Lab: Installation of Red{nbsp}Hat OpenShift AI + +IMPORTANT: The installation requires a user with the _cluster-admin_ role + +. Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned. + +. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators + +[*] You do not have to wait for the previous Operator to complete before installing the next. For this lab you can skip the installation of the optional operators as there is no GPU. + + * Web Terminal + + * Red Hat OpenShift Serverless + + * Red Hat OpenShift Service Mesh + + * Red Hat OpenShift Pipelines + + * GPU Support + + ** Node Feature Discovery Operator (optional) + + ** NVIDIA GPU Operator (optional) + +[TIP] + + Installing these Operators prior to the installation of the OpenShift AI Operator in my experience has made a difference in OpenShift AI acknowledging the availability of these components and adjusting the initial configuration to shift management of these components to OpenShift AI. + +. Navigate to **Operators** -> **OperatorHub** and search for *OpenShift AI*. + +. Click on the `Red{nbsp}Hat OpenShift AI` operator. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. ++ + +. In the `Install Operator` page, leave all of the options as default and click on the *Install* button to start the installation. + +. The operator Installation progress window will pop up. The installation may take a couple of minutes. + + +WARNING: Do proceed with the installation past this point. In order to access the LLM remotely; There will be some modifcations to the Data Science Cluster YAML file prior to completing the installation of Red Hat OpenShift AI. \ No newline at end of file diff --git a/modules/chapter4/chapter1.1/pages/section2.adoc b/modules/chapter4/chapter1.1/pages/section2.adoc new file mode 100644 index 0000000..21e4967 --- /dev/null +++ b/modules/chapter4/chapter1.1/pages/section2.adoc @@ -0,0 +1,89 @@ += Modifying the OpenShift AI TLS Certificate + +[NOTE] + +An SSL/TLS certificate is a digital object that allows systems to verify the identity & subsequently establish an encrypted network connection to another system using the Secure Sockets Layer/Transport Layer Security (SSL/TLS) protocol. + +By default, the Single Model Serving Platform in Openshift AI uses a self-signed certificate generated at installation for the endpoints that are created when deploying a Model server. This can be counter-intuitive because the OCP Cluster already has certificates configured which will be used by default for endpoints like Routes. + +This following procedure explains how to use the same certificate from the OpenShift Container cluster for OpenShift AI. + +== Configure OpenShift AI to use a valid certificate for Routes + +[NOTE] +Most customers will not use the self-signed certificates, opting instead to use certificates generated by their own authority. Therefore this step of adding secrets to OpenShift & OpenShift AI is common process during installation. + +=== Navigate to the OpenShift Container Cluster Dashboard + +From the *openshift-ingress* namespace, copy the content of a secret whose name includes "certs". For example *ingress-certs-*..... The content of the Secret (data) should contain two items, *tls.cert* and *tls.key*. They are the certificate and key that are used for all the OpenShift Routes. + +*Collect the Secret YAML Text:* + + * In the Navigation pane on the left, click on the *Workloads* section, then *Secrets* under Workloads. + * From the Project dropdown, toggle the *show default projects* radial button. + * Select the *openshift-ingress* project from the list. + * Locate the file named *ingress-certs-(05-28-2024)*, type should be *Opaque* + * Click on the filename to open the secret, Select the *YAML Tab* + * Copy all the text from the window, insure you scroll down. (CTL-A should work). + +*Clean & Deploy the Secret YAML Text:* + + * Click on the Project dropdown again, Select the *Istio-system* project + * Select the Create button on the right, then Select the from YAML option. + * Delete the text from the Window, and *paste the copied Secret text* + + * Cleanup the YAML Text to just keep the relevant content. It should look like the below YAML file (the name of the secret will be different, it's normally tied to the date the RDHP Cluster was deployed). Change the type to *kubernetes.io/tls*. + +```yaml +kind: Secret +apiVersion: v1 +metadata: +name: ingress-certs-05-28-2024 +data: +tls.crt: >- + LS0tLS1CRUd... +tls.key: >- + LS0tLS1CRUd... +type: kubernetes.io/tls +``` + +* Copy the Name portion of the text (optional, but helpful) +* Click create to apply this YAML into the istio-system proejct (namespace). + +*We have copied the Secret used by OCP made it available be used by OAI.* + + + +=== Navigate to Operator Hub & select the OpenShift AI Operator + +Time to Deploy the OpenShift AI Data Science Cluster configuration. + +Return to the Navigation Menu, Select Installed Operators, and Click on the OpenShift AI Operator + + * *Select the Option to create a Data Science Cluster.* + + * *Select the radial button to switch to the YAML view.* + + * Find the section below in the YAML file, in the Kserve Section find the Serving/Certificate area; add the line: *secretName:* followed by the name of the secret name that we deployed in the istio-system project. In addition, change the type from SelfSigned to *Provided*. See below for the example. + +```yaml +kserve: +devFlags: {} +managementState: Managed +serving: + ingressGateway: + certificate: + secretName: *ingress-certs-05-28-2024* + type: *Provided* + managementState: Managed + name: knative-serving +``` + +Once you have made those changes to the YAML file, *Click Create* to Deploy the Data Science Cluster. This cluster has not been configured to Utilize the same certificates as the OpenShift Cluster, which should eliminate the Connection errors from our Jupyter Notebook in the next Chapter. + + +Single Model Serve Platform will now be deployed / exposes with the same certificate as OpenShift Routes. Endpoints will be accessible using TLS without having to ignore error messages or create special configurations. + +== OpenShift AI is now running as new Dashboard + +Congradulations, this is pretty complicated subject, We pick up in OpenShift AI in the next Chapter. \ No newline at end of file diff --git a/modules/chapter4/chapter2.1/nav.adoc b/modules/chapter4/chapter2.1/nav.adoc new file mode 100644 index 0000000..d6da93d --- /dev/null +++ b/modules/chapter4/chapter2.1/nav.adoc @@ -0,0 +1,3 @@ +* xref:index.adoc[] +** xref:section1.adoc[] +** xref:section2.adoc[] \ No newline at end of file diff --git a/modules/chapter4/chapter2.1/pages/index.adoc b/modules/chapter4/chapter2.1/pages/index.adoc new file mode 100644 index 0000000..2c7dead --- /dev/null +++ b/modules/chapter4/chapter2.1/pages/index.adoc @@ -0,0 +1,6 @@ += Chapter 2 + +Completed the installation of OpenShift AI on our OpenShift Container Cluster. + +Moving into Uses the features of OpenShift AI to Setup the environment to Host & interact with our LLM Model. + diff --git a/modules/chapter4/chapter2.1/pages/section1.adoc b/modules/chapter4/chapter2.1/pages/section1.adoc new file mode 100644 index 0000000..b460552 --- /dev/null +++ b/modules/chapter4/chapter2.1/pages/section1.adoc @@ -0,0 +1,368 @@ += Custom Runtimes & Data Science Project Components + +A model-serving runtime provides integration with a specified model server and the model frameworks that it supports. By default, Red Hat OpenShift AI includes the following Model RunTimes: + + * OpenVINO Model Server runtime. + * Caikit TGIS for KServe + * TGIS Standalong for KServe + +However, if these runtime do not meet your needs (it doesn’t support a particular model framework, for example), you might want to add your own custom runtimes. + +As an administrator, you can use the OpenShift AI interface to add and enable custom model-serving runtimes. You can then choose from your enabled runtimes when you create a new model server. + + +This exercise will guide you through the broad steps necessary to deploy a custom Serving Runtime in order to serve a model using the Ollama Model Serving Framework. + +[NOTE] +==== +While RHOAI supports the ability to add your own runtime, it is up to you to configure, adjust and maintain your custom runtimes. +==== + +== Adding The Ollama Custom Runtime + +. Log in to RHOAI with a user who is part of the RHOAI admin group + +. Navigate to the Settings menu, then Serving Runtimes + +. Click on the Add Serving Runtime button: + +. Click on Start from scratch and in the window that opens up, paste the following YAML: ++ +```yaml +apiVersion: serving.kserve.io/v1alpha1 +kind: ServingRuntime +labels: + opendatahub.io/dashboard: "true" +metadata: + annotations: + openshift.io/display-name: Ollama + name: ollama +spec: + builtInAdapter: + modelLoadingTimeoutMillis: 90000 + containers: + - image: quay.io/rh-aiservices-bu/ollama-ubi9:0.1.30 + env: + - name: OLLAMA_MODELS + value: /.ollama/models + - name: OLLAMA_HOST + value: 0.0.0.0 + - name: OLLAMA_KEEP_ALIVE + value: '-1m' + name: kserve-container + ports: + - containerPort: 11434 + name: http1 + protocol: TCP + multiModel: false + supportedModelFormats: + - autoSelect: true + name: any +``` + +. After clicking the **Add** button at the bottom of the input area, we are able to see the new Ollama Runtime in the list. We can re-order the list as needed (the order chosen here is the order in which the users will see these choices) + + +== Create Data Science Project + +Navigate to the Data Science Projects + + * Select the create data science project button + + * Enter a name for your project, such as ollama-model. + + * The resource name should be populated automatically + + * Optionally add a description to the data science project + + * Click Create + + + + +== Deploy MinIO as S3 Compatible Storage + +=== MinIO overview + +*MinIO* is a high-performance, S3 compatible object store. It can be deployed on a wide variety of platforms, and it comes in multiple flavors. + +This segment describes a very quick way of deploying the community version of MinIO in order to quickly setup a fully standalone Object Store, in an OpenShift Cluster. This can then be used for various prototyping tasks that require Object Storage. + +[WARNING] +This version of MinIO should not be used in production-grade environments. Also, MinIO is not included in RHOAI, and Red Hat does not provide support for MinIO. + +=== MinIO Deployment +To Deploy MinIO, we will utilize the OpenShift Dashboard. + + * Click on the Project Selection list dropdown, Select the Ollama-Model project; the data science project you created in the previous step. + + * Then Select the + (plus) icon from the top right of the dashboard. + + * This will open a new window where you will paste the following YAML file. In the YAML below its recommended to change the default user name & password. + + +```yaml +--- +kind: PersistentVolumeClaim +apiVersion: v1 +metadata: + name: minio-pvc +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 40Gi + volumeMode: Filesystem +--- +kind: Secret +apiVersion: v1 +metadata: + name: minio-secret +stringData: + # change the username and password to your own values. + # ensure that the user is at least 3 characters long and the password at least 8 + minio_root_user: minio + minio_root_password: minio123 +--- +kind: Deployment +apiVersion: apps/v1 +metadata: + name: minio +spec: + replicas: 1 + selector: + matchLabels: + app: minio + template: + metadata: + creationTimestamp: null + labels: + app: minio + spec: + volumes: + - name: data + persistentVolumeClaim: + claimName: minio-pvc + containers: + - resources: + limits: + cpu: 250m + memory: 1Gi + requests: + cpu: 20m + memory: 100Mi + readinessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 5 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + terminationMessagePath: /dev/termination-log + name: minio + livenessProbe: + tcpSocket: + port: 9000 + initialDelaySeconds: 30 + timeoutSeconds: 1 + periodSeconds: 5 + successThreshold: 1 + failureThreshold: 3 + env: + - name: MINIO_ROOT_USER + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_user + - name: MINIO_ROOT_PASSWORD + valueFrom: + secretKeyRef: + name: minio-secret + key: minio_root_password + ports: + - containerPort: 9000 + protocol: TCP + - containerPort: 9090 + protocol: TCP + imagePullPolicy: IfNotPresent + volumeMounts: + - name: data + mountPath: /data + subPath: minio + terminationMessagePolicy: File + image: >- + quay.io/minio/minio:RELEASE.2023-06-19T19-52-50Z + args: + - server + - /data + - --console-address + - :9090 + restartPolicy: Always + terminationGracePeriodSeconds: 30 + dnsPolicy: ClusterFirst + securityContext: {} + schedulerName: default-scheduler + strategy: + type: Recreate + revisionHistoryLimit: 10 + progressDeadlineSeconds: 600 +--- +kind: Service +apiVersion: v1 +metadata: + name: minio-service +spec: + ipFamilies: + - IPv4 + ports: + - name: api + protocol: TCP + port: 9000 + targetPort: 9000 + - name: ui + protocol: TCP + port: 9090 + targetPort: 9090 + internalTrafficPolicy: Cluster + type: ClusterIP + ipFamilyPolicy: SingleStack + sessionAffinity: None + selector: + app: minio +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-api +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: api + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect +--- +kind: Route +apiVersion: route.openshift.io/v1 +metadata: + name: minio-ui +spec: + to: + kind: Service + name: minio-service + weight: 100 + port: + targetPort: ui + wildcardPolicy: None + tls: + termination: edge + insecureEdgeTerminationPolicy: Redirect +``` + +This should finish in a few seconds. Now it's time to deploy our storage buckets. + +=== MinIO Storage Bucket Creation + +From the OCP Dashboard: + + * Select Workloads / Pods from the navigation menu. + * Click on the minio pod, which should be only pod running the *ollama-model project*. + ** ( ollama-model project or name used fort the Data Science Project created) + + * Then Select the Routes from the Navigation menu. + * This will display two routes, one for the UI & another for the API. + + * For the first step select the UI route, and paste it in a browser Window. + + * This window opens the MinIO Dashboard, login with user/password combination you set, or the default listed in yaml file above. + +Once logged into the MinIO Console: + + * Click Create Bucket to get started. + + * Create two Buckets: + + ** one called *models* + + ** another called *storage* + +[NOTE] + When serving a LLM or other model Openshift AI looks within a Folder, therefore we need at least one subdirectory under the Models Folder. + + * Via the Navigation menu, *Select object browser*, Click on the Model Bucket. + * From the models bucket page, click add path, and type *ollama* as the name of the sub-Folder or path. + +[IMPORTANT] +In most cases to serve a model, the trained model would be uploaded into this sub-directory. *Ollama is a special case, as it can download and manage Severql LLM models as part of the runtime.* + + * We still need a file available in this folder for the model deployment workflow to succeed. + + * So we will copy an emptyfile.txt file to the ollama subdirectory. You can download the file from this location. Or you can create your own file called emptyfile.txt and upload it. + + * Once you have this file ready, upload it into the Ollama path in the model bucket; by clicking the upload button and selecting the file from your local desktop. + +=== Create Data Connection using Minio Storage Buckets + +Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. + +* Select the Data Connection menu, followed by create data connection +* Provide the following values: +** Name: *models* +** Access Key: is the minio_root-user from YAML file +** Secret Key: is the minio_root_password from the YAML File +** Endpoint: is the Minio API URL from the Routes page in Openshift Dashboard +** Region: Is required for AWS storage & cannot be blank (no-region-minio) +*** Bucket: is the Minio Storage bucket name: *models* + +Repeat for the Storage bucket, using *storage* for the name & bucket. + + +== Creating a WorkBench + +Navigate to the Data Science Project section of the OpenShift AI Console /Dashboard. Select the Ollama-model project. + + * Select the WorkBench button, then create workbench + + * Name: ollama-model + + * Notebook Image: Minimal Python + + * Leave the remianing options default + + * Optionally, scroll to the bottom, check the Use data connection box + + ** Select *storage* from the dropdown to attach the storage bucket to the workbench. + + * Select the Create Workbench option. + +[NOTE] +Depending on the notebook image selected, it can take between 2-20 minutes for the container image to be fully deployed. The Open Link will be available when you container is fully deployed. + + +== Creating The Model Server + +From the ollama-model WorkBench Dashboard in the ollama-model project, specify the **Models** section, and select Deploy Model from the **Single Model Serving Platform Button**. + + +*Create the model server with the following values:* + +-- + * Model name: `Ollama-Mistral` + * Serving Runtime: `Ollama` + * Model framework: `Any` + * Model Server Size: `Medium` + * Model location data connection: `models` + * Model location path: `/ollama` + + +After clicking the **Deploy** button at the bottom of the form, we see the model added to our **Models & Model Server list**. When the model is avialable the inference endpoint will populate & the status will indicate a green checkmark. + +We are now ready to interact with our newly deployed LLM Model. Join me in Section 2 to explore Mistral running on OpenShift AI using Jupyter Notebooks. + + diff --git a/modules/chapter4/chapter2.1/pages/section2.adoc b/modules/chapter4/chapter2.1/pages/section2.adoc new file mode 100644 index 0000000..c342a9c --- /dev/null +++ b/modules/chapter4/chapter2.1/pages/section2.adoc @@ -0,0 +1,2 @@ += Jupyter Notebooks & Mistral LLM Model Setup + diff --git a/modules/chapter4/index copy.adoc b/modules/chapter4/index copy.adoc new file mode 100644 index 0000000..c78097e --- /dev/null +++ b/modules/chapter4/index copy.adoc @@ -0,0 +1,22 @@ += Chapter 1 + +== Introduction + +Modern LLMs can understand and utilize language in a way that has been historically unfathomable to expect from a personal computer. These machine learning models can generate text, summarize content, translate, rewrite, classify, categorize, analyze, and more. All of these abilities provide humans with a powerful toolset to au + +In this course, you will learn how to leverage Red Hat OpenShift AI to serve a Large Language Model. + +How do we deliver a model to an inference engine, or server, so that, when the server receives a request from any of the applications in the organization's portfolio, the inference engine can reply with a prediction that increases the speed, efficiency and effective of business problem solving. + +Machine learning models must be deployed in a production environment to process real-time data and handle the problem they were designed to solve. + +In this lab, we going to deploy the Ollama Model framework which operates a bit differently than a standard machine learning model. Using the Ollama runtime, we can load multiple different models once the runtime is deployed. These models have been quantitized so they do not require a GPU. This makes this runtime engine flexible to accomodate the evaluation of mutliple model types. + +WHy is this important, because business aren't implementing a model just for the cool factor. They are looking to solve a business problem. However, they often won't know what model will work best to solve that problem, many are still in the experimental phase. This makes the Ollama framework perfect to evaluate multiple models without needing to reinvent the wheel. + +While this course touches each of the following bullets in the 5 Steps to building an LLM Application graphic, we will primarily focus on the second step, selecting an LLM. Exploring the Ollama Model Runtime. + +Ollama is a relatively new but powerful framework designed for serving machine learning models. It's designed to be efficient, scalable, and easy to use, making it an attractive option for developers and organizations looking to deploy their AI models into production + + +image::redhatllm.gif[] \ No newline at end of file diff --git a/modules/chapter4/nav.adoc b/modules/chapter4/nav.adoc new file mode 100644 index 0000000..d6da93d --- /dev/null +++ b/modules/chapter4/nav.adoc @@ -0,0 +1,3 @@ +* xref:index.adoc[] +** xref:section1.adoc[] +** xref:section2.adoc[] \ No newline at end of file diff --git a/modules/chapter4/pages/index.adoc b/modules/chapter4/pages/index.adoc new file mode 100644 index 0000000..785a3fe --- /dev/null +++ b/modules/chapter4/pages/index.adoc @@ -0,0 +1,3 @@ += Chapter 3 + +This is the home page of _Chapter 3_ in the *hello* quick course.... \ No newline at end of file diff --git a/modules/chapter4/pages/section2.adoc b/modules/chapter4/pages/section2.adoc new file mode 100644 index 0000000..1c5adef --- /dev/null +++ b/modules/chapter4/pages/section2.adoc @@ -0,0 +1,29 @@ += refer only + +*Red{nbsp}Hat OpenShift AI* is available as an operator via the OpenShift Operator Hub. You will install the *Red{nbsp}Hat OpenShift AI operator* and dependencies using the OpenShift web console in this section. + +== Lab: Installation of Red{nbsp}Hat OpenShift AI + +IMPORTANT: The installation requires a user with the _cluster-admin_ role + +. Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned. + +. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators + + * Web Terminal + + * Red Hat OpenShift Serverless + + * Red Hat OpenShift Service Mesh + + * Red Hat OpenShift Pipelines + + * GPU Support + + ** Node Feature Discovery Operator (optional) + + ** NVIDIA GPU Operator (optional) + +[TIP] + + Installing these Operators prior to the installation of OpenShift AI in my experience has made a difference in OpenShift AI acknowledging the availability of these components and adjusting the initial configuration to shift management of these components to OpenShift AI. \ No newline at end of file diff --git a/modules/chapter4/pages/section4.adoc b/modules/chapter4/pages/section4.adoc new file mode 100644 index 0000000..24c9ab2 --- /dev/null +++ b/modules/chapter4/pages/section4.adoc @@ -0,0 +1,240 @@ += Prepare MinIO & Data Connections + +https://min.io[MinIO] is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It is software-defined and runs on any cloud or on-premises infrastructure. + +We will need an S3 solution to share the model from training to deploy, in this exercise we will prepare MinIO to be such S3 solution. + +. In OpenShift, create a new namespace with the name **object-datastore**. ++ +[source,console] +---- +$ oc new-project object-datastore +---- + +. Run the following yaml to install MinIO: ++ +[source,console] +---- +$ oc apply -f https://raw.githubusercontent.com/RedHatQuickCourses/rhods-qc-apps/main/4.rhods-deploy/chapter2/minio.yml -n object-datastore +---- + +. Get the route to the MinIO dashboard. ++ +[source,console] +---- +$ oc get routes -n object-datastore | grep minio-ui | awk '{print $2}' +---- ++ +[INFO] +==== +Use this route to navigate to the S3 dashboard using a browser. With the browser, you will be able to create buckets, upload files, and navigate the S3 contents. +==== + +. Get the route to the MinIO API. ++ +[source,console] +---- +$ oc get routes -n object-datastore | grep minio-api | awk '{print $2}' +---- ++ +[INFO] +==== +Use this route as the S3 API endpoint. Basically, this is the URL that we will use when creating a data connection to the S3 in RHOAI. +==== + +[IMPORTANT] +==== +Make sure to create a new path in your bucket, and upload to such path, not to root. Later, when requesting to deploy a model to the **Model Server**, you will be required to provide a path inside your bucket. +==== + +== Create A Data Connection + +. In the RHOAI dashboard, create a project named **iris-project**. + +. In the **Data Connections** section, create a Data Connection to your S3. ++ +image::add-minio-iris-data-connection.png[Add iris data connection from minio] ++ +[IMPORTANT] +==== +- The credentials (Access Key/Secret Key) are `minio`/`minio123`. +- Make sure to use the API route, not the UI route (`oc get routes -n object-datastore | grep minio-api | awk '{print $2}'`). +- The region is not important when using MinIO, this is a property that has effects when using AWS S3. +However, you must enter a non-empty value to prevent problems with model serving. +- Mind typos for the bucket name. +- You don't have to select a workbench to attach this data connection to. +==== + +== Create a Model Server + +. In the **Models and model servers** section, add a server. ++ +image::add-server-button.png[add server] + +. Fill the form with the following values: ++ +-- +* Server name: `iris-model-server`. +* Serving runtime: `OpenVINO Model Server`. +* Select the checkboxes to expose the models through an external route, and to enable token authentication. +Enter `iris-serviceaccount` as the service account name. +-- ++ +image::add-server-form-example.png[Add Server Form] ++ +[IMPORTANT] +==== +The model server you are creating works as a template for deploying models. As you can see, we have not specified the model that we will deploy, or the data connection from where that model will be retrieved, in this form we are specifying the resources, constraints, and engine that will define the engine where the model will be deployed later. +It is important to pay special attention to the following characteristics: + +- **Serving Runtime**: By default we have _OpenVINO_ and _OpenVINO with GPU_. The important aspects when defining these runtimes are: The framework that is capable of reading models in a given format, and weather such platform supports using GPUs. The use of GPUs allow for complex and lengthy computations to be delivered faster, as there are huge models that require a good amount of power to calculate, based on the given parameters a prediction. + +- **Number of replicas to deploy**: Planning for expected performance and number of expected requests is essential for this part of the form. Here we select if we will load balance a given request between multiple container replicas. + +- **Model Server Size**: In this part of the form we define the resources assigned to each model server container. You can create and select a pre-defined size from the dropdown, or you can select _custom_, in which case, new fields will be displayed to request the processing and memory power to be assigned to your containers. ++ +image::model-server-size.png[model server size] + +- **Model Route**: There are models that can be consumed only from other containers inside the same OpenShift cluster, here we have the ability to not make this server available to entities outside our cluster, or to instruct the model server configuration to assign an external route. When we don't expose the model externally through a route, click on the Internal Service link in the Inference endpoint section: ++ +image::figure14_0.png[Inference endpoint] ++ +A popup will display the address for the gRPC and the REST URLs: ++ +image::figure15_0.png[Endpoint URLs] + +- **Token authorization**: In this part of the form we have a helper checkmark to add authorization to a service account that will be created with access to our model server. Only API requests that present a token that has access to the given service account will be able to run the inference service. +==== + +. After clicking the **Add** button at the bottom of the form, you will be able to see a new **Model Server** configuration in your project, you can click the **Tokens** column, which will make visible the tokens that you can share with the applications that will consume the inference API. ++ +image::model-server-with-token.png[Model Server with token] + +== Deploy The Model + +. At the right side of the **Model Server**, we can find the **Deploy Model** button, let's click the **Deploy Model** button, to start filling the **Deploy Model** form: ++ +image::deploy-model-button.png[Deploy Model button] + +. Fill the **Deploy Model** form. ++ +-- +* Model name: `Ollama-Mistral` +* Serving Runtime: `Ollama` +* Model framework: `Any` +* Model Server Size: `Medium` +* Model location data connection: `models` +* Model location path: `/ollama` +-- ++ +image::deploy-model-form.png[Deploy Model form] + +. After clicking the **Add** button at the bottom of the form, you will be able to see a new entry at the **Deployed models** column for your **Model Server**, clicking in the column will eventually show a check mark under the **Status** column: ++ +image::deploy-model-success.png[Deploy model success] + +. Observe and monitor the assets created in your OpenShift **iris-project** namespace. ++ +[source,console] +---- +$ oc get routes -n iris-project +$ oc get secrets -n iris-project | grep iris-model +$ oc get events -n iris-project +---- ++ +image::iris-project-events.png[Iris project events] ++ +[TIP] +==== +Deploying a **Model Server** triggers a **ReplicaSet** with **ModelMesh**, which attach your model to the inference runtime, and exposes it through a route. Also, notice the creation of a secret with your token. +==== + +== Test The Model + +Now that the model is ready to use, we can make an inference using the REST API + +. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands. ++ +[source,console] +---- +$ export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-model | awk '{print $2}') +---- + +. Assign an authentication token to an environment variable in your local machine. ++ +[source,console] +---- +$ export TOKEN=$(oc whoami -t) +---- + +. Request an inference with the REST API. ++ +[source,console] +---- +$ curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-model/infer \ + -X POST \ + --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}],"outputs" : [{"name" : "output0"}]}' +---- + +The result of using the inference service looks like the following output: +```json +{"model_name":"iris-model__isvc-590b5324f9","model_version":"1","outputs":[{"name":"label","datatype":"INT64","shape":[1],"data":[1]},{"name":"scores","datatype":"FP32","shape":[1,3],"data":[4.851966,3.1275764,3.4580243]}]} +``` + +=== Model Serving Request Body + +As you tested with the preceding `curl` command, to make HTTP requests to a deployed model you must use a specific request body format. +The basic format of the input data is as follows: + +[subs=+quotes] +---- +{ + "inputs": [{ + "name" : "input", <1> + "shape" : [2,3], <2> + "datatype" : "INT64", <3> + "data" : [[34, 54, 65], [4, 12, 21]] <4> + }] +} +---- +<1> The name of the input tensor. +The data scientist that creates the model must provide you with this value. +<2> The shape of the input tensor. +<3> The https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#tensor-data-types[data type] of the input tensor. +<4> The tensor contents provided as a JSON array. + +The API supports additional parameters. +For a complete list, refer to the https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#inference-request-json-object[Kserve Predict Protocol docs]. + +To make a request in Python, you can use the `requests` library, as the following example shows: + +[source,python] +---- +import requests + +input_data = [-0.15384616, -0.9909186] + +# You must adjust this path or read it from an environment variable +INFERENCE_ENDPOINT = "https://my-model.apps.my-cluster.example.com/v2/models/my-model/infer" + +# Build the request body +payload = { + "inputs": [ + { + "name": "dense_input", + "shape": [1, 2], + "datatype": "FP32", + "data": input_data + } + ] +} + +# Send the POST request +response = requests.post(INFERENCE_ENDPOINT, json=payload) + +# Parse the JSON response +result = response.json() + +# Print predicted values +print(result['outputs'][0]['data']) +---- diff --git a/modules/chapter4/section-temp.adoc b/modules/chapter4/section-temp.adoc new file mode 100644 index 0000000..9c9d7c0 --- /dev/null +++ b/modules/chapter4/section-temp.adoc @@ -0,0 +1,376 @@ += Model Serving + +Why Ollama - It's unique value is that it makes installing and running LLMs very simple, even for non-technical users. Reduces the resources requirement for many models by >50%, and also the dependency of a GPU with excellent performance in my opinion. + + +== Learn by Doing + +In this quickcourse, there is one goal. Deploy an LLM in OpenShift AI, then utilize Jupyter Notebooks to query said LLM. + +Along the way, we'll discuss personas or roles that would perform these at our Customers. + +For example, you don't get to start with an OpenShift AI Platform, instead you will start with an OpenShift Container Cluster. In the Section Two of this course, together we will tackle the challenges of upgrading our OCP Cluster to host our OpenShift AI Platform. + +Why should this matter to you, it will provided an solid overview of the components needed, will allow you to explain the difficulty level of installing OpenShift AI, and will give you the experience to better understand what value each component adds to the Mix. + +There are several times that tasks must be performed in OCP related to the operations of the OPenShift AI Platform, so it's best to be familar with the both Dashboards. + +While the actuall installation can performed in a few minutes, there is an advanced setup that we need to perform to solve an issue with Cluster Security Certificates. Most organizations will run their own TLS certifcates, rather than use the self-generated certificates in the cluster. + +[note:] + The reason we need to perform this porition is that the Original OpenShift Container CLuster created Self Signed Certificates upon deployment. When we install OpenShift AI, it will also create a set of certificates for deploying resources. Well, when we expose resources externally, they use the OCP cluster certificates, which causes a mismatch when we try to connect remotely. So instead of having two different sets of certificates, we are going to use the OCP cluster certificates for the OpenShift AI cluster, simipling connecting to the running model. + +1. Once we complete the OpenShift AI setup, which should take about 15-20 minutes, the next step is to Launch OpenShift AI. + +We will then add the Ollama Model Serving Runtime .Yaml file as an additional Single Model Serving Option. + +1. Moving onto the next step, we will create our first Data Science Project in the OpenShift AI platform. This will provide an isolated workspace for our resources. + +For Step 4 we need external storage, or remotely accessible storage via an API in order to retrive the LLM model file. We will use MinIO for this purpose. We will deploy another .YAML file in the Data Science Project we just created. + +Next we will create storage buckets, update the file needed by the Ollama model to our new bucket in a sub-folder + +Once that is complete we head back to our Project, and create a new workbench, will deploy or UI interface, which will be a jupyter notebook. + +Once that is complete, we can finally launch our Ollma Single Model Server. + +Then we will need to configuire the model to be hosted by our Ollama framework which will be Mistral 7B. + +Once that is complete, we can add git repository to the Jupyter notebook and interact with our model using the LangChain library. + +The last step is for you to interact with your new LLM model, hosted in OpenShift AI. You can query the model with your questions, to determine how good it is. + +Then if you up to the Challenge - Delete the model, and redeploy the Ollama fRamework and deploy a different model, perhaps Llama2, or Lava and compare the performance of the different models. You'll be on your own for this part, but I know you got this! + + ++ +```shell + pip install -r /opt/app-root/src/rhods-qc-apps/4.rhods-deploy/chapter1/requirements.txt +``` ++ +image::terminal-install.png[terminal install] + +. Open the notebook **purchase-amount** from the **rhods-qc-apps/4.rhods-deploy/chapter1/purchase-amount.ipynb** directory: ++ +image::purchase-amount-notebook.png[purchase-amount notebook] + +. Run the notebook, and notice the creation of a new file in your environment, the `mymodel.pkl` ++ +image::mymodel-pkl.png[Model file export] + +[IMPORTANT] +==== +There are different formats and libraries to export the model, in this case we are using pickle. Other common formats are: + +* Protobuf + +* MLeap + +* H5 + +* ONNX + +* PMML + +* Torch + +The use of either of those formats depends on the target server runtime, some of them are proven to be more efficient than others for certain type of training algorithms and model sizes. +==== + +=== Use the Model in Another Notebook + +The model can be deserialized in another notebook, and used to generate a prediction: + +. Open the notebook **use-purchase-amount** from the **rhods-qc-apps/4.rhods-deploy/chapter1/use-purchase-amount.ipynb** directory: ++ +image::use-purchase-amount-notebook.png[use-purchase-amount notebook create] + +. Run the **use-purchase-amount** notebook and notice the result: ++ +- You can get the same result without training the model again. +- You are not training the model in the **use-purchase-amount** notebook, you are re-using the output from the training notebook, and using the generated model to generate an inference. + +[TIP] +==== +At this moment the model can be exported and imported in other projects for its use. Normally there will be an S3 bucket or a model registry to store models and versions of such models, and instead of manually exporting the model, there would be pipelines making the model available. +==== + +== Use the Model in a Container + +For this section, you need Podman to create an image, and a registry to upload the resulting image. + +=== Web application that uses the model + +The pickle model that we previously exported can be used in a Flask application. In this section we present an example Flask application that uses the model. + +[IMPORTANT] +==== +Although we are actually serving a model with Flask in the exercise, Flask is not considered part of the Model Serving feature. This example represents one way in which some customers decide to embed their models in containers, although RHOAI provides for mechanisms that can make this process of serving a model a simpler process, when provided with the proper model formats. +==== + +. In your computer, create a new directory to save the source code of the web application. +Navigate to that directory. + +. Download the `mymodel.pkl` file from JupyterLab into this directory. + +. Open the directory with a python IDE, then create a python script named `app.py` with the following code: ++ +```python[app.py] +from flask import Flask, request +import pickle + +app = Flask(__name__) +# Load model +with open('mymodel.pkl', 'rb') as f: + model = pickle.load(f) + +model_name = "Time to purchase amount predictor" +model_file = 'model.plk' +version = "v1.0.0" + + +@app.route('/info', methods=['GET']) +def info(): + """Return model information, version how to call""" + result = {} + + result["name"] = model_name + result["version"] = version + + return result + + +@app.route('/health', methods=['GET']) +def health(): + """REturn service health""" + return 'ok' + + +@app.route('/predict', methods=['POST']) +def predict(): + feature_dict = request.get_json() + if not feature_dict: + return { + 'error': 'Body is empty.' + }, 500 + + try: + return { + 'status': 200, + 'prediction': int(model(feature_dict['time'])) + } + except ValueError as e: + return {'error': str(e).split('\n')[-1].strip()}, 500 + + +if __name__ == '__main__': + app.run(host='0.0.0.0') +``` + +. Create a `requirements.txt` to describe the python dependencies to install on container startup: ++ +```[requirements.txt] +click==8.0.3 +cycler==0.11.0 +Flask==2.0.2 +fonttools==4.28.5 +gunicorn==20.1.0 +itsdangerous==2.0.1 +Jinja2==3.0.3 +kiwisolver==1.3.2 +MarkupSafe==2.0.1 +matplotlib==3.5.1 +numpy==1.22.0 +packaging==21.3 +pandas==1.3.5 +Pillow==9.0.0 +pyparsing==3.0.6 +python-dateutil==2.8.2 +pytz==2021.3 +scikit-learn==1.0.2 +scipy==1.7.3 +six==1.16.0 +sklearn==0.0 +threadpoolctl==3.0.0 +Werkzeug==2.0.2 +``` + +. Create a `Containerfile` to build an image with the Flask application: ++ +```docker[containerfile] +# Base image +FROM python:3.9 + +# Set working directory +WORKDIR /app + +# Copy files +COPY app.py /app <1> +COPY requirements.txt /app <2> +COPY mymodel.pkl /app <3> + +# Install dependencies +RUN pip install -r requirements.txt + +# Run the application +EXPOSE 8000 +ENTRYPOINT ["gunicorn", "-b", "0.0.0.0:8000", "--access-logfile", "-", "--error-logfile", "-", "--timeout", "120"] +CMD ["app:app"] +``` +<1> The python application source code +<2> The list of packages to install +<3> The model + +. Build and push the image to an image registry ++ + +[source,console] +---- +$ podman login quay.io +$ podman build -t purchase-predictor:1.0 . +$ podman tag purchase-predictor:1.0 quay.io/user_name/purchase-predictor:1.0 +$ podman push quay.io/user_name/purchase-predictor:1.0 +---- ++ +[NOTE] +==== +If you are running macOS ARM versions, then run: + +podman build --platform linux/amd64 -t purchase-predictor:1.0 . + +==== ++ +After you push the image, open quay.io in your browser and make the image public. + +. Deploy the model image to **OpenShift**. Get the OCP_CLUSTER_URL value from your RHDP page for this classroom. ++ +[source,console] +---- +$ oc login :6443 +$ oc new-project model-deploy +$ oc new-app --name purchase-predictor quay.io/user_name/purchase-predictor:1.0 +$ oc expose service purchase-predictor +---- + +. Get the route for the deployed application ++ +[source,console] +---- +$ ROUTE_NAME=$(oc get route purchase-predictor -o jsonpath='{.spec.host}') +---- + +Now we can use the Flask application with some commands such as: +[source,console] +---- +$ curl http://$ROUTE_NAME/health +ok +$ curl http://$ROUTE_NAME/info +{"name":"Time to purchase amount predictor","version":"v1.0.0"} +$ curl -d '{"time":4}' -H "Content-Type: application/json" \ +> -X POST \ +> http://$ROUTE_NAME/predict +{"prediction":34,"status":200} +---- + +[IMPORTANT] +==== +In this section we have manually: + +. Developed an application that uses the model + +. Built an image with such application + +. Push the image to a registry + +. Deployed the containerized application in OpenShift + +. Exposed the application's endpoint in OpenShift by creating a route + +. Consumed the model through the application's REST API to request a prediction + +There are automated and faster ways to perform these steps. In the following sections, we will learn about runtimes that only require you to provide a model, and they automatically provision an inference service for you. +==== + +== RHOAI Model Serving Runtimes + +In the previous example, we manually created a Model Server by sending the model to an image that can interpret the model and expose it for consumption. In our example we used Flask. + +However, in Red Hat OpenShift AI, you do not need to manually create serving runtimes. +By default, Red Hat OpenShift AI includes a pre-configured model serving runtime, OpenVINO, which can load, execute, and expose models trained with TensorFlow and PyTorch. +OpenVINO supports various model formats, such as the following ones: + +https://onnx.ai[ONNX]:: +An open standard for machine learning interoperability. + +https://docs.openvino.ai/latest/openvino_ir.html[OpenVino IR]:: +The proprietary model format of OpenVINO, the model serving runtime used in OpenShift AI. + +In order to leverage the benefits of OpenVINO, you must: + +. Export the model in a format compatible with one of the available RHOAI runtimes. +. Upload the model to an S3 +. Create a Data Connection to the S3 containing the model +. Create or use one of the available serving runtimes in a Model Server configuration that specifies the size and resources to use while setting up an inference engine. +. Start a model server instance to publish your model for consumption + +While publishing this model server instance, the configurations will allow you to define how applications securely connect to your model server to request for predictions, and the resources that it can provide. + +=== Model Serving Resources + +When you use model serving, RHOAI uses the `ServingRuntime` and `InferenceService` custom resources. + +ServingRuntime:: +Defines a model server. + +InferenceService:: +Defines a model deployed in a model server. + +For example, if you create a model server called `foo`, then RHOAI creates the following resources: + +* `modelmesh-serving` Service +* `foo` ServingRuntime +** `modelmesh-serving-foo` Deployment +*** `modelmesh-serving-foo-...` ReplicaSet +**** `modelmesh-serving-foo-...-...` Pod + +The `ServingRuntime` defines your model server and owns a `Deployment` that runs the server workload. +The name of this deployment is prefixed with the `modelmesh-serving-` prefix. +Initially, when no models are deployed, the deployment is scaled to zero, so no pod replicas are running. + +When creating the first model server in a data science project, RHOAI also creates a `Service` called `modelmesh-serving` to map HTTP, HTTPs and gRPC traffic into the model servers. + +[NOTE] +==== +The `modelmesh-serving` service maps traffic for all model servers. +No additional services are created when you create more than one model server. +==== + +After you create a model server, you are ready to deploy models. +When you deploy a model in a model server, RHOAI creates an `InferenceService` custom resource, which defines the deployed model properties, such as the name and location of the model file. +For example, If you deploy a model called `my-model`, then RHOAI creates the following resources. + +* `my-model` InferenceService +** `my-model` Route, which points to the `modelmesh-serving` Service. + +[NOTE] +==== +The route is only created if you have selected the `Make deployed models available through an external route` checkbox when creating the server. +The `InferenceService` owns the route. +==== + +At the same time, to be able to serve the model, RHOAI starts the model server by scaling the `model-serving-` deployment up to one pod replica. +This model serving pod runs the model serving containers: + +* `mm`: the ModelMesh model serving framework. +* The model serving runtime container, such as `ovms` for OpenVINO. +* The ModelMesh https://github.com/kserve/modelmesh-runtime-adapter[runtime adapter] for your specifc serving runtime. +For example, if you are using OpenVINO, then the container is `ovms-adapter`. +* `rest-proxy`: For HTTP traffic. +* `oauth-proxy`: For authenticating HTTP requests. + +[NOTE] +==== +The `modelmesh-serving` pod runs the model server, which handles one or more deployed models. +No additional pods are created when you deploy multiple models. +==== + diff --git a/modules/chapter4/section1 copy.adoc b/modules/chapter4/section1 copy.adoc new file mode 100644 index 0000000..d4c15bc --- /dev/null +++ b/modules/chapter4/section1 copy.adoc @@ -0,0 +1,50 @@ += Model Serving + +Why Ollama - It's unique value is that it makes installing and running LLMs very simple, even for non-technical users. Reduces the resources requirement for many models by >50%, and also the dependency of a GPU with excellent performance in my opinion. + + +== Learn by Doing + +In this quickcourse, there is one goal. Deploy an LLM in OpenShift AI, then utilize Jupyter Notebooks to query said LLM. + +Along the way, we'll discuss personas or roles that would perform these at our Customers. + +For example, you don't get to start with an OpenShift AI Platform, instead you will start with an OpenShift Container Cluster. In the Section Two of this course, together we will tackle the challenges of upgrading our OCP Cluster to host our OpenShift AI Platform. + +Why should this matter to you, it will provided an solid overview of the components needed, will allow you to explain the difficulty level of installing OpenShift AI, and will give you the experience to better understand what value each component adds to the Mix. + +There are several times that tasks must be performed in OCP related to the operations of the OPenShift AI Platform, so it's best to be familar with the both Dashboards. + +While the actuall installation can performed in a few minutes, there is an advanced setup that we need to perform to solve an issue with Cluster Security Certificates. Most organizations will run their own TLS certifcates, rather than use the self-generated certificates in the cluster. + + +.... + The reason we need to perform this porition is that the Original OpenShift Container CLuster created Self Signed Certificates upon deployment. When we install OpenShift AI, it will also create a set of certificates for deploying resources. Well, when we expose resources externally, they use the OCP cluster certificates, which causes a mismatch when we try to connect remotely. So instead of having two different sets of certificates, we are going to use the OCP cluster certificates for the OpenShift AI cluster, simipling connecting to the running model. +.... + + + +1. Once we complete the OpenShift AI setup, which should take about 15-20 minutes, the next step is to Launch OpenShift AI. + +We will then add the Ollama Model Serving Runtime .Yaml file as an additional Single Model Serving Option. + +1. Moving onto the next step, we will create our first Data Science Project in the OpenShift AI platform. This will provide an isolated workspace for our resources. + +For Step 4 we need external storage, or remotely accessible storage via an API in order to retrive the LLM model file. We will use MinIO for this purpose. We will deploy another .YAML file in the Data Science Project we just created. + +Next we will create storage buckets, update the file needed by the Ollama model to our new bucket in a sub-folder + +Once that is complete we head back to our Project, and create a new workbench, will deploy or UI interface, which will be a jupyter notebook. + +Once that is complete, we can finally launch our Ollma Single Model Server. + +Then we will need to configuire the model to be hosted by our Ollama framework which will be Mistral 7B. + +Once that is complete, we can add git repository to the Jupyter notebook and interact with our model using the LangChain library. + +The last step is for you to interact with your new LLM model, hosted in OpenShift AI. You can query the model with your questions, to determine how good it is. + +Then if you up to the Challenge - Delete the model, and redeploy the Ollama fRamework and deploy a different model, perhaps Llama2, or Lava and compare the performance of the different models. You'll be on your own for this part, but I know you got this! + + + diff --git a/modules/chapter4/section1.adoc b/modules/chapter4/section1.adoc new file mode 100644 index 0000000..75e2bb9 --- /dev/null +++ b/modules/chapter4/section1.adoc @@ -0,0 +1,51 @@ += Section 1 +# Ollama Runtime + +The [Ollama](https://github.com/ollama/ollama) runtime can be used with Open Data Hub and OpenShift AI Single-Model Serving stack to serve Large Language Models (LLMs) as an alternative to Caikit+TGIS or standalone TGIS. Currently supported models are listed [here](https://ollama.com/library). + +Note that as this runtime is specifically built for CPU only (even with a GPU it won't use it). + +## Installation + +You must first make sure that you have properly installed the necessary component of the Single-Model Serving stack, as documented [here](https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2-latest/html/serving_models/serving-large-models_serving-large-models). + +Once the stack is installed, adding the runtime is pretty straightforward: + +- As an admin, in the OpenShift AI Dashboard, open the menu `Settings -> Serving runtimes`. +- Click on `Add serving runtime`. +- For the type of model serving platforms this runtime supports, select `Single model serving platform`. +- Upload the file `ollama-runtime.yaml` from the current folder, or click `Start from scratch` and copy/paste its content. + +The runtime is now available when deploying a model. + +## Model Deployment + +This runtime can be used in almost the same way as the out of the box ones. A small adjustment is necessary because Ollama downloads the models directly when instructed to (populating the models on the object storage would be cumbersome and not really necessary). But as KServer will try to copy a model from object storage anyway, we have to trick it a little bit... + +- Copy the file `emptyfile` to an object store bucket. In fact it can be any file, as long as the "folder" in the bucket is not empty... +- Deploy the "model" from the Dashboard. Make sure you have enough RAM/CPU to run the model(s) you want. +- At this stage, what is deployed is only the Ollama server itself, and you get an endpoint address. +- Download the model you want by querying the endpoint (replace the address with the one from your endpoint, as well as the model name): + + + curl https://your-endpoint/api/pull \ + -k \ + -H "Content-Type: application/json" \ + -d '{"name": "mistral"}' + + +## Usage + +You can now query the model using curl like this: + +```yaml +curl https://your-endpoint/api/generate \ + -k \ + -H "Content-Type: application/json" \ + -d '{ + "model": "mistral", + "prompt":"Why is the sky blue?" + }' +``` + +You can also use the notebook example [here](../../examples/notebooks/langchain/Langchain-Ollama-Prompt-memory.ipynb). Beware, you will have adaptations to do if you're using self-signed certificates, which is the default for Single Stack Serving. More information [here](https://ai-on-openshift.io/odh-rhoai/single-stack-serving-certificate/) \ No newline at end of file diff --git a/modules/chapter4/section2.adoc b/modules/chapter4/section2.adoc new file mode 100644 index 0000000..5535495 --- /dev/null +++ b/modules/chapter4/section2.adoc @@ -0,0 +1,29 @@ += Installing Red{nbsp}Hat OpenShift AI Using the Web Console + +*Red{nbsp}Hat OpenShift AI* is available as an operator via the OpenShift Operator Hub. You will install the *Red{nbsp}Hat OpenShift AI operator* and dependencies using the OpenShift web console in this section. + +== Lab: Installation of Red{nbsp}Hat OpenShift AI + +IMPORTANT: The installation requires a user with the _cluster-admin_ role + +. Login to the Red Hat OpenShift using a user which has the _cluster-admin_ role assigned. + +. Navigate to **Operators** -> **OperatorHub** and search for each of the following Operators individually. Click on the button or tile for each. In the pop up window that opens, ensure you select the latest version in the *stable* channel and click on **Install** to open the operator's installation view. For this lab you can skip the installation of the optional operators + + * Web Terminal + + * Red Hat OpenShift Serverless + + * Red Hat OpenShift Service Mesh + + * Red Hat OpenShift Pipelines + + * GPU Support + + ** Node Feature Discovery Operator (optional) + + ** NVIDIA GPU Operator (optional) + +[TIP] + + Installing these Operators prior to the installation of OpenShift AI in my experience has made a difference in OpenShift AI acknowledging the availability of these components and adjusting the initial configuration to shift management of these components to OpenShift AI. \ No newline at end of file diff --git a/modules/chapter4/section3 copy.adoc b/modules/chapter4/section3 copy.adoc new file mode 100644 index 0000000..d95debe --- /dev/null +++ b/modules/chapter4/section3 copy.adoc @@ -0,0 +1,332 @@ += OpenVINO Model Serving + +In this section we will work in an exercise to deploy a model to an OpenVINO Serving Runtime. + +== Prepare MinIO + +https://min.io[MinIO] is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It is software-defined and runs on any cloud or on-premises infrastructure. + +We will need an S3 solution to share the model from training to deploy, in this exercise we will prepare MinIO to be such S3 solution. + +. In OpenShift, create a new namespace with the name **object-datastore**. ++ +[source,console] +---- +$ oc new-project object-datastore +---- + +. Run the following yaml to install MinIO: ++ +[source,console] +---- +$ oc apply -f https://raw.githubusercontent.com/RedHatQuickCourses/rhods-qc-apps/main/4.rhods-deploy/chapter2/minio.yml -n object-datastore +---- + +. Get the route to the MinIO dashboard. ++ +[source,console] +---- +$ oc get routes -n object-datastore | grep minio-ui | awk '{print $2}' +---- ++ +[INFO] +==== +Use this route to navigate to the S3 dashboard using a browser. With the browser, you will be able to create buckets, upload files, and navigate the S3 contents. +==== + +. Get the route to the MinIO API. ++ +[source,console] +---- +$ oc get routes -n object-datastore | grep minio-api | awk '{print $2}' +---- ++ +[INFO] +==== +Use this route as the S3 API endpoint. Basically, this is the URL that we will use when creating a data connection to the S3 in RHOAI. +==== + +== Training The Model +We will use the iris dataset model for this excercise. + +. Using a JupyterLab workbench at RHOAI, import the repository: https://github.com/RedHatQuickCourses/rhods-qc-apps.git ++ +[TIP] +==== +It is recommended to use a workbench that was created with the **Standard Data Science** Notebook image. +==== + +. Make sure that the workbench environment serves the required python packages for the notebook to run, for this to happen, open a terminal and run the following command to verify that the packages are already installed: ++ +[source,console] +---- +$ pip install -r /opt/app-root/src/rhods-qc-apps/4.rhods-deploy/chapter2/requirements.txt +---- + +[TIP] +==== +You might also want to execute the preceding command in the notebook kernel by using the `%pip` syntax in the notebook. +Alternatively, you can create a custom notebook image that includes the `skl2onnx` package. +==== +. Open and run the notebook **iris_to_onnx** from **rhods-qc-apps/4.rhods-deploy/chapter2** directory ++ +image::iris_training_onnx.png[iris training to onnx format] ++ +[NOTE] +==== +Converting a model to ONNX format depends on the library that you use to create the model. +In this case, the model is created with Scikit-Learn, so you must use the https://onnx.ai/sklearn-onnx/[sklearn-onnx] library to perform the conversion. + +To convert from PyTorch, see https://pytorch.org/tutorials/beginner/onnx/intro_onnx.html[Introduction to ONNX in the PyTorch docs]. + +To convert from TensorFlow, use the https://github.com/onnx/tensorflow-onnx[tf2onnx] library. +==== + +. Observe that a file has been created: `rf_iris.onnx`, download this file to your computer, so that we can upload it to S3. ++ +image::iris-download.png[iris model download] + +. Upload the file `rf_iris.onnx` to a bucket named **models**, with a path **iris** in your S3. The username is *minio* and the password is *minio123*. ++ +image::iris-s3-upload.png[iris model s3 upload] ++ +[IMPORTANT] +==== +Make sure to create a new path in your bucket, and upload to such path, not to root. Later, when requesting to deploy a model to the **Model Server**, you will be required to provide a path inside your bucket. +==== + +== Create A Data Connection + +. In the RHOAI dashboard, create a project named **iris-project**. + +. In the **Data Connections** section, create a Data Connection to your S3. ++ +image::add-minio-iris-data-connection.png[Add iris data connection from minio] ++ +[IMPORTANT] +==== +- The credentials (Access Key/Secret Key) are `minio`/`minio123`. +- Make sure to use the API route, not the UI route (`oc get routes -n object-datastore | grep minio-api | awk '{print $2}'`). +- The region is not important when using MinIO, this is a property that has effects when using AWS S3. +However, you must enter a non-empty value to prevent problems with model serving. +- Mind typos for the bucket name. +- You don't have to select a workbench to attach this data connection to. +==== + + +== Using `boto3` + +Although the previous section indicates that you should manually download the `rf_iris.onnx` file to your computer and upload it to S3, you can also upload your model directly from your notebook or Python file, by using the `boto3` library. +To use this approach, you must: + +* Have the `boto3` library installed in your workbench (most of the RHOAI notebook images include this library). +* Attach your data connection to the workbench. + +After training the model, you can upload the file as the following example demostrates: + +[source,python] +---- +import os +import boto3 + +source_path = "model.onnx" +s3_destination_path = "models/model.onnx" + +key_id = os.getenv("AWS_ACCESS_KEY_ID") +secret_key = os.getenv("AWS_SECRET_ACCESS_KEY") +endpoint = os.getenv("AWS_S3_ENDPOINT") +bucket_name = os.getenv("AWS_S3_BUCKET") + +s3 = boto3.client( + "s3", + aws_access_key_id=key_id, + aws_secret_access_key=secret_key, + endpoint_url=endpoint, + use_ssl=True) + +s3.upload_file(source_path, bucket_name, Key=s3_destination_path) +---- + +[NOTE] +==== +You can also use the `boto3` library to download data. +This can be helpful in the data collection stage, for example for gathering data files from S3. + +[source,python] +---- +s3_data_path = "dataset.csv" +s3.download_file(bucket_name, s3_data_path, "my/local/path/dataset.csv") +---- +==== + +== Create a Model Server + +. In the **Models and model servers** section, add a server. ++ +image::add-server-button.png[add server] + +. Fill the form with the following values: ++ +-- +* Server name: `iris-model-server`. +* Serving runtime: `OpenVINO Model Server`. +* Select the checkboxes to expose the models through an external route, and to enable token authentication. +Enter `iris-serviceaccount` as the service account name. +-- ++ +image::add-server-form-example.png[Add Server Form] ++ +[IMPORTANT] +==== +The model server you are creating works as a template for deploying models. As you can see, we have not specified the model that we will deploy, or the data connection from where that model will be retrieved, in this form we are specifying the resources, constraints, and engine that will define the engine where the model will be deployed later. +It is important to pay special attention to the following characteristics: + +- **Serving Runtime**: By default we have _OpenVINO_ and _OpenVINO with GPU_. The important aspects when defining these runtimes are: The framework that is capable of reading models in a given format, and weather such platform supports using GPUs. The use of GPUs allow for complex and lengthy computations to be delivered faster, as there are huge models that require a good amount of power to calculate, based on the given parameters a prediction. + +- **Number of replicas to deploy**: Planning for expected performance and number of expected requests is essential for this part of the form. Here we select if we will load balance a given request between multiple container replicas. + +- **Model Server Size**: In this part of the form we define the resources assigned to each model server container. You can create and select a pre-defined size from the dropdown, or you can select _custom_, in which case, new fields will be displayed to request the processing and memory power to be assigned to your containers. ++ +image::model-server-size.png[model server size] + +- **Model Route**: There are models that can be consumed only from other containers inside the same OpenShift cluster, here we have the ability to not make this server available to entities outside our cluster, or to instruct the model server configuration to assign an external route. When we don't expose the model externally through a route, click on the Internal Service link in the Inference endpoint section: ++ +image::figure14_0.png[Inference endpoint] ++ +A popup will display the address for the gRPC and the REST URLs: ++ +image::figure15_0.png[Endpoint URLs] + +- **Token authorization**: In this part of the form we have a helper checkmark to add authorization to a service account that will be created with access to our model server. Only API requests that present a token that has access to the given service account will be able to run the inference service. +==== + +. After clicking the **Add** button at the bottom of the form, you will be able to see a new **Model Server** configuration in your project, you can click the **Tokens** column, which will make visible the tokens that you can share with the applications that will consume the inference API. ++ +image::model-server-with-token.png[Model Server with token] + +== Deploy The Model + +. At the right side of the **Model Server**, we can find the **Deploy Model** button, let's click the **Deploy Model** button, to start filling the **Deploy Model** form: ++ +image::deploy-model-button.png[Deploy Model button] + +. Fill the **Deploy Model** form. ++ +-- +* Model name: `iris-model` +* Model framework: `onnx - 1` +* Model location data connection: `iris-data-connection` +* Model location path: `iris` +-- ++ +image::deploy-model-form.png[Deploy Model form] + +. After clicking the **Add** button at the bottom of the form, you will be able to see a new entry at the **Deployed models** column for your **Model Server**, clicking in the column will eventually show a check mark under the **Status** column: ++ +image::deploy-model-success.png[Deploy model success] + +. Observe and monitor the assets created in your OpenShift **iris-project** namespace. ++ +[source,console] +---- +$ oc get routes -n iris-project +$ oc get secrets -n iris-project | grep iris-model +$ oc get events -n iris-project +---- ++ +image::iris-project-events.png[Iris project events] ++ +[TIP] +==== +Deploying a **Model Server** triggers a **ReplicaSet** with **ModelMesh**, which attach your model to the inference runtime, and exposes it through a route. Also, notice the creation of a secret with your token. +==== + +== Test The Model + +Now that the model is ready to use, we can make an inference using the REST API + +. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands. ++ +[source,console] +---- +$ export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-model | awk '{print $2}') +---- + +. Assign an authentication token to an environment variable in your local machine. ++ +[source,console] +---- +$ export TOKEN=$(oc whoami -t) +---- + +. Request an inference with the REST API. ++ +[source,console] +---- +$ curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-model/infer \ + -X POST \ + --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}],"outputs" : [{"name" : "output0"}]}' +---- + +The result of using the inference service looks like the following output: +```json +{"model_name":"iris-model__isvc-590b5324f9","model_version":"1","outputs":[{"name":"label","datatype":"INT64","shape":[1],"data":[1]},{"name":"scores","datatype":"FP32","shape":[1,3],"data":[4.851966,3.1275764,3.4580243]}]} +``` + +=== Model Serving Request Body + +As you tested with the preceding `curl` command, to make HTTP requests to a deployed model you must use a specific request body format. +The basic format of the input data is as follows: + +[subs=+quotes] +---- +{ + "inputs": [{ + "name" : "input", <1> + "shape" : [2,3], <2> + "datatype" : "INT64", <3> + "data" : [[34, 54, 65], [4, 12, 21]] <4> + }] +} +---- +<1> The name of the input tensor. +The data scientist that creates the model must provide you with this value. +<2> The shape of the input tensor. +<3> The https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#tensor-data-types[data type] of the input tensor. +<4> The tensor contents provided as a JSON array. + +The API supports additional parameters. +For a complete list, refer to the https://github.com/kserve/kserve/blob/master/docs/predict-api/v2/required_api.md#inference-request-json-object[Kserve Predict Protocol docs]. + +To make a request in Python, you can use the `requests` library, as the following example shows: + +[source,python] +---- +import requests + +input_data = [-0.15384616, -0.9909186] + +# You must adjust this path or read it from an environment variable +INFERENCE_ENDPOINT = "https://my-model.apps.my-cluster.example.com/v2/models/my-model/infer" + +# Build the request body +payload = { + "inputs": [ + { + "name": "dense_input", + "shape": [1, 2], + "datatype": "FP32", + "data": input_data + } + ] +} + +# Send the POST request +response = requests.post(INFERENCE_ENDPOINT, json=payload) + +# Parse the JSON response +result = response.json() + +# Print predicted values +print(result['outputs'][0]['data']) +---- diff --git a/package-lock.json b/package-lock.json index 681986f..0d43cf4 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,5 +1,5 @@ { - "name": "course-starter", + "name": "llm-model-serving", "lockfileVersion": 3, "requires": true, "packages": {