Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tech review updates part 2 #14

Merged
merged 1 commit into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,4 @@ nav:
- modules/chapter1/nav.adoc
- modules/chapter2/nav.adoc
- modules/chapter3/nav.adoc
- modules/chapter4/nav.adoc
- modules/appendix/nav.adoc
- modules/chapter4/nav.adoc
4 changes: 2 additions & 2 deletions modules/chapter2/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
= OpenShift AI Initilization
= OpenShift AI Initialization

== Supported configurations
OpenShift AI is supported in two configurations:

* A managed cloud service add-on for *Red Hat OpenShift Service on Amazon Web Services* (ROSA, with a Customer Cloud Subscription for AWS) or *Red Hat OpenShift Dedicated* (GCP).
For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service 1].
For information about OpenShift AI on a Red Hat managed environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_cloud_service/1[Product Documentation for Red Hat OpenShift AI Cloud Service].

* Self-managed software that you can install on-premise or on the public cloud in a self-managed environment, such as *OpenShift Container Platform*.
For information about OpenShift AI as self-managed software on your OpenShift cluster in a connected or a disconnected environment, see https://access.redhat.com/documentation/en-us/red_hat_openshift_ai_self-managed/2.8[Product Documentation for Red Hat OpenShift AI Self-Managed 2.8].
Expand Down
Empty file.
Empty file.
13 changes: 10 additions & 3 deletions modules/chapter3/pages/section2.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ stringData:
# change the username and password to your own values.
# ensure that the user is at least 3 characters long and the password at least 8
minio_root_user: minio
minio_root_password: minio123
minio_root_password: minio321!
---
kind: Deployment
apiVersion: apps/v1
Expand Down Expand Up @@ -203,6 +203,10 @@ From the OCP Dashboard:

. This window opens the MinIO Dashboard. Log in with username/password combination you set, or the default listed in yaml file above.

.. username = minio

.. password = minio321!

Once logged into the MinIO Console:

. Click Create Bucket to get started.
Expand All @@ -214,7 +218,7 @@ Once logged into the MinIO Console:
.. *storage*

[NOTE]
When serving an LLM or other model, Openshift AI looks within a Folder. Therefore, we need at least one subdirectory under the Models Folder.
When serving an LLM or other model, Openshift AI looks within a folder. Therefore, we need at least one subdirectory under the models folder.

. Via the Navigation menu, *select object browser*, then click on the Model Bucket.
. From the models bucket page, click add path, and type *ollama* as the name of the sub-folder or path.
Expand All @@ -224,7 +228,10 @@ In most cases, to serve a model, the trained model would be uploaded into this s

. We still need a file available in this folder for the model deployment workflow to succeed.

. So we will copy an *emptyfile.txt* file to the ollama subdirectory. You can download the file from https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/serving-runtimes/ollama_runtime[*this location*]. Alternatively, you can create your own file called emptyfile.txt and upload it.
. So we will copy an *emptyfile.txt* file to the ollama subdirectory.


You can download the file from xref:attachment$emptyfile.txt[this location]. Alternatively, you can create your own file called emptyfile.txt and upload it.

. Once you have this file ready, upload it into the Ollama path in the model bucket by clicking the upload button and selecting the file from your local desktop.

6 changes: 2 additions & 4 deletions modules/chapter3/pages/section3.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= OpeonShift AI Resources - 2
= OpenShift AI Resources - 2

video::llm_dataconn_v3.mp4[width=640]

Expand Down Expand Up @@ -49,12 +49,10 @@ Depending on the notebook image selected, it can take between 2-20 minutes for t

From the ollama-model WorkBench Dashboard in the ollama-model project, navigate to the **Models** section, and select Deploy Model from the **Single Model Serving Platform Button**.

image::deploy_model_2.png[width=800]

*Create the model server with the following values:*


.. Model name: `Ollama-Mistral`
.. Model name: `ollama-mistral`
.. Serving Runtime: `Ollama`
.. Model framework: `Any`
.. Model Server Size: `Medium`
Expand Down
Binary file added modules/chapter4/images/add_a_cell.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added modules/chapter4/images/clone_a_repo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added modules/chapter4/images/inference_endpoint.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added modules/chapter4/images/llama3_url.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added modules/chapter4/images/llama_llm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
Binary file added modules/chapter4/images/replaced_endpoints.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added modules/chapter4/images/replaced_endpoints2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion modules/chapter4/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
* xref:index.adoc[]
** xref:section1.adoc[]
** xref:section2.adoc[]
** xref:section2.adoc[]
** xref:section3.adoc[]
2 changes: 1 addition & 1 deletion modules/chapter4/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
= Jupyter Notebooks & Large Language Model Inference
= Jupyter Notebooks & LLMs

This chapter begins with running and configured OpenShift AI environment. If you don't already have your environment running, head over to Chapter 2.

Expand Down
113 changes: 77 additions & 36 deletions modules/chapter4/pages/section1.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,72 +2,113 @@

video::llm_jupyter_v3.mp4[width=640]

== Open the Jupyter Notebook
== Open JupyterLab

From the OpenShift AI ollama-model workbench dashboard:
JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. For a demonstration of JupyterLab and its features, https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html#what-will-happen-to-the-classic-notebook[you can view this video.]

* Select the Open link to the right of the status section. When the new window opens, use the OpenShift admin user & password to login to the Notebook.

Return to the ollama-model workbench dashboard in the OpenShift AI console.

* Select the *Open* link to the right of the status section. When the new window opens, use the OpenShift admin user & password to login to JupyterLab.

* Click *Allow selected permissions* button to complete login to the notebook.

[NOTE]
If the *OPEN* link for the notebook is grayed out, the notebook container is still starting. This process can take a few minutes & up to 20+ minutes depending on the notebook image we opted to choose.


== Inside the Jupyter Notebook

Clone the notebook file to interact with the Ollama Framework from this location: https://github.com/rh-aiservices-bu/llm-on-openshift.git
== Inside JupyterLab

Navigate to the llm-on-openshift/examples/notebooks/langchain folder:
This takes us to the JupyterLab screen where we can select multiple options / tools / to work to begin our data science experimentation.

Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_
Our first action is to clone a git repository that contains a collection of LLM projects including the notebook we are going to use to interact with the LLM.

Explore the notebook, and then continue.
Clone the github repository to interact with the Ollama Framework from this location:
https://github.com/rh-aiservices-bu/llm-on-openshift.git

=== Update the Inference Endpoint
. Copy the URL link above

Head back to the RHOAI workbench dashboard & copy the interence endpoint from our ollama-mistral model.
// Should it be inference instead of interence?
. Click on the Clone a Repo Icon above explorer section window.

Return the Jupyter Notebook Environment:
image::clone_a_repo.png[width=640]

. Paste the inference endpoint into the Cell labeled interfence_server_url = *"replace with your own inference address"*
. Paste the link into the *clone a repo* pop up, make sure the *included submodules are checked*, then click the clone.

. Navigate to the llm-on-openshift/examples/notebooks/langchain folder:

image::serverurl.png[width=800]
. Then open the file: _Langchain-Ollama-Prompt-memory.ipynb_

. We can now start executing the code in the cells, starting with the set the inference server URL cell.
. Explore the notebook, and then continue.

. Next we run the second cell: !pip install -q langchain==0.1.14 ; there is a notice to update pip, just ignore and continue.
=== Configure the Ollama Framework with a Large Language Model

. The third cell imports the langchain components that provide the libraries and programming files to interact with our LLM model.
. From the Notebook page, add a new cell above the inference url

. In the fourth cell, place our first call to the Ollama-Mistral Framework Served by OpenShift AI.
image::add_a_cell.png[width=640]

[WARNING]
Before we continue, we need to perform the following additional step. As mentioned, The Ollama Model Runtime we launched in OpenShift AI is a Framework that can host multiple LLM Models. It is currently running but is waiting for the command to instruct it to download Model to Serve. The following command needs to run from the OpenShift Dashboard. We are going to use the web_terminal operator to perform this next step.

== Activating the Mistral Model in Ollama
The Ollama Model Runtime we deployed using the Single Model Serving Platform in OpenShift AI is a Framework that can host various large language models. It is currently running, but is waiting for the command to instruct the framework on which model to download and serve.

We will need to obtain the endpoint from the OpenShift AI model serving console. I usually just paste the text below into a cell in the Jupyter Notebook and paste the url in the code block from there.
. To load the mistral model, we are going use the following python code to instruct the runtime to download and serve a quantized 4 bit version of the mistral large language model.

image::mistral_config.png[width=640]
. Copy the code below and paste this code in the new cell added to the notebook in the previous step.


[source, yaml]
----
curl https://your-endpoint/api/pull \
-k \
-H "Content-Type: application/json" \
-d '{"name": "mistral"}'
import requests

headers = {
# Already added when you pass json=
# 'Content-Type': 'application/json',
}

json_data = {
'name': 'mistral',
}

response = requests.post('https://your-endpoint/api/pull', headers=headers, json=json_data, verify=False)
----

. Next copy the entire code snippet, and open the OpenShift Dashboard.
. At the top right of the dashboard, locate the ">_" and select it.
. This will open the terminal window at the bottom of the dashboard.
. Click on the Start button in the terminal window, wait for the bash..$ prompt to appear
. Past the modified code block into the window and press enter.
We'll need to modify the url in the bottom line beginning with *response =* in the next step.

=== Update the Inference Endpoints

Head back to the RHOAI ollama-model workbench dashboard, from the models tab, copy the inference endpoint for the ollama-mistral model.

image::inference_endpoint.png[width=640]

Return the Jupyter notebook

We will be updating two cells with the inference endpoint.

. Replace the https://your-endopint section of the python code we copied into the new cell. Ensure you leave the /api/pull portion appended to the url.

. Replace the red text inside the quotation marks for the inference_server_url with the same inference endpoint url.

image::replaced_endpoints2.png[width=640]

=== Execute cell code to assemble the langchain components

. We can now start executing the code in the cells, begin with the new cell added to the top. Click on the cell to activate blue indicator to the left of the cell.

.. You will receive a message about an Unverified HTTPs request. This is because we didn’t use authentication for this application. You can ignore this for this lab experience, but in production we would enable authentication using certificates as suggested. https://developers.redhat.com/articles/2021/06/18/authorino-making-open-source-cloud-native-api-security-simple-and-flexible[To use authentication we need to install the Authorino Operator.]

.. The mistral model files are now being downloaded to the Ollama Framework.

. Continue executing through the cells.

. Next we run the cell: *!pip install -q langchain==0.1.14* ; there is a notice to update pip; ignore and continue.

. The next cell imports the langchain components that provide the libraries and programming files to interact with our LLM.

. This *"Create the LLM instance"* cell sets the variables that determine how we are going to interact with our model and how it should respond - sets that into an array using *llm* variable.

. Next run the *"Create the prompt"* cell. Here we are setting the *template* variable with the details of how the model operate, including constraints and boundries when generating the response. We often to not experience the system message when interacting with an LLM, but this is a standard field that is included along with the user prompt.

. Continue executing the cells, *"memory for the conversation"* keeps the previous context / conversation history so full history of the chat conversation is sent as part of the prompt.

The message: *status: pulling manifest* should appear. This begins the model downloading process.
. The *create the chain* cell, combines each of previous variables: llm, prompt, memory, and adds a verbose boolean to create the conversation variable, which will be sent to Models inference endpoint running in OpenShift AI. The verbose option set to true displays the entire conversation sent to the Model in the notebook before the Models (AI's) response.

image::curl_command.png[width=800]

Once the download completes, the *status: success:* message appears. We can now return to the Jupyter Notebook Tab in the browser and proceed.
In the next, section, we'll send our first input to the running Mistral Large Language Model.
59 changes: 26 additions & 33 deletions modules/chapter4/pages/section2.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,66 +2,59 @@

video::llm_model_v3.mp4[width=640]

=== Create the Prompt
== Let's Talk with the LLM

This cell sets the *system message* portion of the query to our model. Normally, we don't get the see this part of the query. This message details how the model should act, respond, and consider our questions. It adds checks to valdiate the information is best as possible, and to explain answers in detail.
=== First Input

== Memory for the conversation
The first input cell sent via the notebook to the Mistral model askes it to describe Paris in 100 words or less.

This cell keeps track of the conversation, this way history of the chat are also sent along with new chat information, keeping the context for future questions.
In green text is the window, there is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question. This is known as the system message. Additionally, is the current conversation which contains the Human Question or Prompt sent to Model, and next is the AI answer.

The next cell tracks the conversation and prints it to the Notebook output window so we can experience the full conversation list.

=== First input to our LLM

The Notebooks first input to our model askes it to describe Paris in 100 words or less.

In green text is the window, there is the setup message that is sent along with the single sentence question to desctibe to the model how to consider and respond to the question.

It takes approximately 12 seconds for the model to respond with the first word of the reply, and the final word is printed to the screen approximately 30 seconds after the request was started.
It takes few seconds for the OpenShift AI model to respond with the first words of the reply. The response answered the question in a well-considered informative paragraph that is less than 100 words in length.

image::paris.png[width=800]

The responce answered the question in a well-considered and informated paragraph that is less than 100 words in length.

=== Second Input

Notice that the Second input - "Is there a River" - does not specify where the location is that might have a River. Because the conversation history is passed with the second input, there is not need to specify any additional informaiton.
Notice that the Second input - "Is there a River" - does not specify where the location is. Due to the conversation history being passed with the second input, there is no need to specify any additional informaiton.

image::london.png[width=800]
== Second Example

The total time to first word took approximately 14 seconds this time, just a bit longer due the orginal information being sent. The time for the entire reponse to be printed to the screen just took over 4 seoncds.
Before we continue with London example, we execute a cell to change the conversation mode to non-verbose. This elimiates the context of the prompt displayed in the notebook to instead just show the model's reply.

Overall our Model is performing well without a GPU and in a container limited to 4 cpus & 10Gb of memory.
We also execute a cell to clear memory, or the conversation history reguarding Paris.

== Second Example Prompt
We did not disable the memory, or the verbosity of the conversation; we simply hid that section from being visible in the notebook.

Similar to the previous example, except we use the City of London, and run a cell to remove the verbose text reguarding what is sent or recieved apart from the answer from the model.
Go ahead run the second exmaple cells and evalute the responses from the Model.

There is no change to memory setting, but go ahead and evalute where the second input; "Is there a river?" is answer correctly.
image::london.png[width=800]

== Experimentation with Model

Add a few new cells to the Notebook.
There are multiple different types of large language models, while we can read about them, using them first hand is best way to experience how they perform.

image::experiment.png[width=800]
So now it's time to experiement on your own, or continue to follow along with this guide.

Experiment with clearing the memory statement, then asking the river question again. Or perhaps copy one of the input statements and add your own question for the model.

Try not clearing the memory and asking a few questions.
Add a few new cells to the bottom of the Notebook.

image::experiment.png[width=800]

**You have successfully deployed a Large Language Model, now test the information that it has available and find out what is doesn't know.**
Experiment by coping the clear memory cell text, paste the contents into one of the new cells. Next copy one of the input statements and add your own question for the model. Then run or execute those cells to learn more about the models capabilities.

I used the following examples:

== Delete the Environment
. Are you an AI model ?
. Tell me a joke please ?

Once you have finished experimenting with questions, make sure you head back to the Red Hat Demo Platform and delete the Openshift Container Platform Cluster.
Then I asked one of my standard questions across models to determine it's knowledge of history:

You don't have to remove any of the resources; deleting the environment will remove any resources created during this lesson.
*Was George Washington Married?*

=== Leave Feedback
Why I ask ths question is several models say GW was married twice. I believed the first one, and this had me thinking several of the next models where wrong. It's critical that we evalute models to determine their viability for business use cases.

If you enjoyed this walkthrough, please send the team a note.
If you have suggestions to make it better or clarify a point, please send the team a note.
Try clearing the memory and asking your own questions.

Until next time, Keep being Awesome!
Continue to experiment with the Mistral model, or move to the next section, where we evaluate a differnet large language model.
Loading
Loading