diff --git a/integrations/vllm/llama-3.1-8b-instruct/deploy_llama31-8b-instruct.ipynb b/integrations/vllm/llama-3.1-8b-instruct/deploy_llama31-8b-instruct.ipynb
new file mode 100644
index 00000000..dd07e5e4
--- /dev/null
+++ b/integrations/vllm/llama-3.1-8b-instruct/deploy_llama31-8b-instruct.ipynb
@@ -0,0 +1,739 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "df2ee018",
+   "metadata": {},
+   "source": [
+    "# A Guide for Llama3.1 8B-Instruct on Hopsworks\n",
+    "\n",
+    "For details about this Large Language Model (LLM) visit the model page in the HuggingFace repository ➡️ [link](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "191915f4",
+   "metadata": {},
+   "source": [
+    "### 1️⃣ Download Llama3.1 8B-Instruct using the huggingface_hub library\n",
+    "\n",
+    "First, we download the Llama3.1 model files (e.g., weights, configuration files) directly from the HuggingFace repository.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "174751e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install huggingface_hub --quiet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "c31b01df",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Place your HuggingFace token in the HF_TOKEN environment variable\n",
+    "\n",
+    "import os\n",
+    "os.environ[\"HF_TOKEN\"] = \"<INSERT_YOUR_HF_TOKEN>\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "8b78a085",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6bedd0a8884e4f48887d2a3d10944592",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Fetching 14 files:   0%|          | 0/14 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from huggingface_hub import snapshot_download\n",
+    "\n",
+    "llama31_local_dir = snapshot_download(\"meta-llama/Llama-3.1-8B-Instruct\", ignore_patterns=\"original/*\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "865bbd91",
+   "metadata": {},
+   "source": [
+    "## 2️⃣ Register Llama3.1 8B-Instruct into Hopsworks Model Registry\n",
+    "\n",
+    "Once the model files are downloaded from the HuggingFace repository, we can register the models files into the Hopsworks Model Registry."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "7b7cba39",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 14:53:39,802 INFO: Python Engine initialized.\n",
+      "\n",
+      "Logged in to project, explore it here https://hopsworks.ai.local/p/119\n"
+     ]
+    }
+   ],
+   "source": [
+    "import hopsworks\n",
+    "\n",
+    "project = hopsworks.login()\n",
+    "mr = project.get_model_registry()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e005f6b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The following instantiates a Hopsworks LLM model, not yet saved in the Model Registry\n",
+    "\n",
+    "llama31 = mr.llm.create_model(\n",
+    "    name=\"llama31_instruct\",\n",
+    "    description=\"Llama3.1 8B-Instruct model (via HF)\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "05ed4ee6",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3596ef2b3b504ec8b7fcda36b4ddd48a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/6 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model created, explore it at https://hopsworks.ai.local/p/119/models/llama31_instruct/1\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Model(name: 'llama31_instruct', version: 1)"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Register the Llama model pointing to the local model files\n",
+    "\n",
+    "llama31.save(llama31_local_dir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce98024e",
+   "metadata": {},
+   "source": [
+    "## 3️⃣ Deploy Llama3.1 8B-Instruct\n",
+    "\n",
+    "After registering the LLM model into the Model Registry, we can create a deployment that serves it using the vLLM engine.\n",
+    "\n",
+    "Hopsworks provides two types of deployments to serve LLMs with the vLLM engine:\n",
+    "\n",
+    "- **Using the official vLLM OpenAI server**: an OpenAI API-compatible server implemented by the creators of vLLM where the vLLM engine is configured with a user-provided configuration (yaml) file.\n",
+    "\n",
+    "- **Using the KServe built-in vLLM server**: a KServe-based implementation of an OpenAI API-compatible server for more advanced users who need to provide a predictor script for the initialization of the vLLM engine and (optionally) the implementation of the *completions* and *chat/completions* endpoints.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a75117e1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 14:58:26,221 WARNING: VersionWarning: No version provided for getting model `llama31_instruct`, defaulting to `1`.\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get a reference to the Llama model if not obtained yet\n",
+    "\n",
+    "llama31 = mr.get_model(\"llama31_instruct\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "a9b080ab",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "52bd52e2b6884ab180e43f5f8fc55496",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading: 0.000%|          | 0/62 elapsed<00:00 remaining<?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Upload vllm engine config file for the deployments\n",
+    "\n",
+    "ds_api = project.get_dataset_api()\n",
+    "\n",
+    "path_to_config_file = f\"/Projects/{project.name}/\" + ds_api.upload(\"llama_vllmconfig.yaml\", \"Resources\", overwrite=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54014afa",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using KServe vLLM server\n",
+    "\n",
+    "Create a model deployment by providing a predictor script and (optionally) a configuration file with the arguments for the vLLM engine."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fe74238b",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "bc20d19b7bba42089042d5b6aabfbc7e",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading: 0.000%|          | 0/1714 elapsed<00:00 remaining<?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Deployment created, explore it at https://hopsworks.ai.local/p/119/deployments/38\n",
+      "Before making predictions, start the deployment by using `.start()`\n"
+     ]
+    }
+   ],
+   "source": [
+    "# upload predictor script\n",
+    "path_to_predictor_script = f\"/Projects/{project.name}/\" + ds_api.upload(\"llama_predictor.py\", \"Resources\", overwrite=True)\n",
+    "\n",
+    "llama31_depl = llama31.deploy(\n",
+    "    name=\"llama31v1\",\n",
+    "    description=\"Llama3.1 8B-Instruct from HuggingFace\", \n",
+    "    script_file=path_to_predictor_script,\n",
+    "    config_file=path_to_config_file,  # optional\n",
+    "    resources={\"num_instances\": 1, \"requests\": {\"cores\": 2, \"memory\": 1024*16, \"gpus\": 1}},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c3e5133",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using vLLM OpenAI server\n",
+    "\n",
+    "Create a model deployment by providing a configuration file with the arguments for the vLLM engine."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "d914f7b6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Deployment created, explore it at https://hopsworks.ai.local/p/119/deployments/39\n",
+      "Before making predictions, start the deployment by using `.start()`\n"
+     ]
+    }
+   ],
+   "source": [
+    "llama31_depl = llama31.deploy(\n",
+    "    name=\"llama31v2\",\n",
+    "    description=\"Llama3.1 8B-Instruct from HuggingFace\",\n",
+    "    config_file=path_to_config_file,\n",
+    "    resources={\"num_instances\": 1, \"requests\": {\"cores\": 2, \"memory\": 1024*12, \"gpus\": 1}},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8315fe0",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "d23937ba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Retrieve one of the deployments created above\n",
+    "\n",
+    "ms = project.get_model_serving()\n",
+    "llama31_depl = ms.get_deployment(\"llama31v2\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "02df5c46",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "f76ff714cde74abeb5ea4b97c2f97615",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/5 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Start making predictions by using `.predict()`\n"
+     ]
+    }
+   ],
+   "source": [
+    "llama31_depl.start(await_running=60*15) # wait for 15 minutes maximum"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "210b7d6f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# llama31_depl.stop()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "8c58b989",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "PredictorState(status: 'Running')"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "llama31_depl.get_state()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d198f9e4",
+   "metadata": {},
+   "source": [
+    "## 4️⃣ Prompting Llama3.1 8B-Instruct\n",
+    "\n",
+    "Once the Llama31 deployment is up and running, we can start sending user prompts to the LLM. You can either use an OpenAI API-compatible client (e.g., openai library) or any other http client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "cd95dec9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Get the istio endpoint from the Llama deployment page in the Hopsworks UI.\n",
+    "istio_endpoint = \"<ISTIO_ENDPOINT>\" # with format \"http://<ip-address>\"\n",
+    "    \n",
+    "# Resolve base uri. NOTE: KServe's vLLM server prepends the URIs with /openai\n",
+    "base_uri = \"/openai\" if llama31_depl.predictor.script_file is not None else \"\"\n",
+    "\n",
+    "openai_v1_uri = istio_endpoint + base_uri + \"/v1\"\n",
+    "completions_url = openai_v1_uri + \"/completions\" \n",
+    "chat_completions_url = openai_v1_uri + \"/chat/completions\"\n",
+    "\n",
+    "# Resolve API key for request authentication\n",
+    "if \"SERVING_API_KEY\" in os.environ:\n",
+    "    # if running inside Hopsworks\n",
+    "    api_key_value = os.environ[\"SERVING_API_KEY\"]\n",
+    "else:\n",
+    "    # Create an API KEY using the Hopsworks UI and place the value below\n",
+    "    api_key_value = \"<API_KEY>\"\n",
+    "    \n",
+    "# Prepare request headers\n",
+    "headers = {\n",
+    "    'Content-Type': 'application/json',\n",
+    "    'Authorization': 'ApiKey ' + api_key_value,\n",
+    "    'Host': f\"{llama31_depl.name}.{project.name.lower().replace('_', '-')}.hopsworks.ai\", # also provided in the Hopsworks UI\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68d78906",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using httpx"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "4d3b5073",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import httpx"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "aad58347",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Completion request:  {'model': 'llama31v2', 'messages': [{'role': 'user', 'content': 'Who is the best French painter. Answer with detailed explanations.'}]}\n",
+      "2025-01-27 15:01:45,144 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "<Response [200 OK]>\n",
+      "Choosing the \"best\" French painter is subjective, as it depends on personal taste and historical context. However, I can provide you with some of the most renowned French painters and highlight their unique contributions to the world of art.\n",
+      "\n",
+      "1. **Claude Monet** (1840-1926)\n",
+      "Monet is often considered one of the greatest French painters. He was a founding member of the Impressionist movement, which emphasized capturing the fleeting effects of light and color in outdoor settings. Monet's brushstrokes were spontaneous and expressive, and he is famous for his series of water lily paintings (Nymphéas) and his iconic depictions of London's fog-shrouded streets.\n",
+      "\n",
+      "Monet's innovative techniques and his focus on light, color, and atmosphere paved the way for future generations of artists. His paintings continue to be celebrated for their beauty, simplicity, and emotional resonance.\n",
+      "\n",
+      "2. **Pierre-Auguste Renoir** (1841-1919)\n",
+      "Renoir was another key figure in the Impressionist movement. His paintings are characterized by their warmth, sensitivity, and technical mastery. He was particularly drawn to capturing the beauty of everyday life, from the bustling streets of Paris to the private moments of intimacy among friends and family.\n",
+      "\n",
+      "Renoir's colorful and expressive portraits, such as \"Dance at Le Moulin de la Galette\" and \"Girl with a Hoop,\" showcase his ability to convey the joy and vitality of life through his art. His influence can be seen in many subsequent artists, from Fauvism to Expressionism.\n",
+      "\n",
+      "3. **Jean-Honoré Fragonard** (1732-1806)\n",
+      "Fragonard was a Rococo painter known for his delicate and enigmatic depictions of love, nature, and everyday life. His paintings often feature elegant, ornate settings, and his use of pastel colors created a dreamy, ethereal atmosphere.\n",
+      "\n",
+      "Fragonard's most famous works, such as \"The Happy Accidents of the Swing\" and \"The Stolen Kiss,\" showcase his ability to convey the subtleties of human emotion and the fleeting nature of pleasure. His romantic and idyllic visions of the world have inspired artists for centuries.\n",
+      "\n",
+      "4. **Édouard Manet** (1832-1883)\n",
+      "Manet was a pioneer of modern art, and his influence can be seen in many subsequent movements, including Impressionism and Expressionism. His paintings often blurred the lines between fine art and popular culture, incorporating elements of everyday life, fashion, and celebrity.\n",
+      "\n",
+      "Manet's most famous works, such as \"Olympia\" and \"A Bar at the Folies-Bergère,\" showcase his ability to challenge traditional notions of beauty and representation. His innovative approach to composition and his use of bold, pure colors paved the way for the development of modern art.\n",
+      "\n",
+      "5. **Paul Cézanne** (1839-1906)\n",
+      "Cézanne was a Post-Impressionist painter who redefined the way artists approached representation and color. His paintings often feature complex, layered perspectives, which\n",
+      " challenge the viewer to engage with the artwork on multiple levels.\n",
+      "\n",
+      "Cézanne's most famous works, such as \"Still Life with Apples\" and \"The Bathers,\" showcase his ability to capture the essence of color and form through sheer painterly bravura. His influence can be seen in many subsequent artists, from Fauvism to Cubism.\n",
+      "\n",
+      "While it's difficult to identify a single \"best\" French painter, these individuals have greatly contributed to the world of art and continue to inspire and influence artists today.\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for a user message\n",
+    "#\n",
+    "\n",
+    "user_message = \"Who is the best French painter. Answer with detailed explanations.\"\n",
+    "\n",
+    "completion_request = {\n",
+    "    \"model\": llama31_depl.name,\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": user_message\n",
+    "        }\n",
+    "    ]\n",
+    "}\n",
+    "\n",
+    "print(\"Completion request: \", completion_request, end=\"\\n\")\n",
+    "\n",
+    "response = httpx.post(chat_completions_url, headers=headers, json=completion_request, timeout=45.0)\n",
+    "print(response)\n",
+    "print(response.json()[\"choices\"][0][\"message\"][\"content\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "7923fe72",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Completion request:  {'model': 'llama31v2', 'messages': [{'role': 'user', 'content': 'Hi! How are you doing today?'}, {'role': 'assistant', 'content': \"I'm doing well! How can I help you?\"}, {'role': 'user', 'content': 'Can you tell me what the temperate will be in Dallas, in fahrenheit?'}]}\n",
+      "2025-01-27 15:01:50,060 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "However, I'm a large language model, I don't have real-time access to current weather information. But I can suggest some options to help you find the current temperature in Dallas, Texas:\n",
+      "\n",
+      "1. **Check online weather websites**: You can visit websites like weather.com, accuweather.com, or wunderground.com and enter \"Dallas, TX\" in the search bar to get the current temperature.\n",
+      "2. **Use a voice assistant**: If you have a smart speaker or virtual assistant like Siri, Google Assistant, or Alexa, you can ask them to give you the current temperature in Dallas.\n",
+      "3. **Check a mobile app**: Download a weather app like Dark Sky, Weather Underground, or The Weather Channel, and search for \"Dallas, TX\" to get the current temperature.\n",
+      "\n",
+      "If you want a general idea of the temperature in Dallas, I can tell you that Dallas has a humid subtropical climate, with hot summers and mild winters. The average high temperature in July (Dallas's hottest month) is around 95°F (35°C), while the average high temperature in January (Dallas's coldest month) is around 53°F (12°C).\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for list of messages\n",
+    "#\n",
+    "\n",
+    "messages = [{\n",
+    "    \"role\": \"user\",\n",
+    "    \"content\": \"Hi! How are you doing today?\"\n",
+    "}, {\n",
+    "    \"role\": \"assistant\",\n",
+    "    \"content\": \"I'm doing well! How can I help you?\",\n",
+    "}, {\n",
+    "    \"role\": \"user\",\n",
+    "     \"content\": \"Can you tell me what the temperate will be in Dallas, in fahrenheit?\"\n",
+    "}]\n",
+    "\n",
+    "\n",
+    "completion_request = {\n",
+    "    \"model\": llama31_depl.name,\n",
+    "    \"messages\": messages\n",
+    "}\n",
+    "\n",
+    "print(\"Completion request: \", completion_request, end=\"\\n\")\n",
+    "\n",
+    "response = httpx.post(chat_completions_url, headers=headers, json=completion_request, timeout=45.0)\n",
+    "\n",
+    "print(response.json()[\"choices\"][0][\"message\"][\"content\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe5d4b7a",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using OpenAI client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "0e5ccec9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install openai --quiet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "c4b1afc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "241e1a44",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = OpenAI(\n",
+    "    base_url=openai_v1_uri,\n",
+    "    api_key=\"X\",\n",
+    "    default_headers=headers\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "5a12fda6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 15:01:59,744 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "Determining the \"best\" French painter can be subjective as opinions vary based on personal taste and artistic preferences. However, here are some of the most renowned French painters:\n",
+      "\n",
+      "1. **Claude Monet** (1840-1926): A founder of Impressionism, Monet is famous for his captivating landscapes, water lilies, and sunsets. His soft, dreamy brushstrokes revolutionized the art world.\n",
+      "2. **Pierre-Auguste Renoir** (1841-1919): A leading figure in Impressionism, Renoir is celebrated for his vibrant depictions of everyday life, often focusing on the beauty of the human body.\n",
+      "3. **Henri Matisse** (1869-1954): A pioneer of Fauvism, Matisse is renowned for his bold, colorful works that blended elements of modern art and craftsmanship. His intricate cut-outs and paper sculptures are highly acclaimed.\n",
+      "4. **Paul Cézanne** (1839-1906): A Post-Impressionist master, Cézanne played a crucial role in the development of Cubism. His still-life paintings and landscapes feature innovative uses of color and form.\n",
+      "5. **Jean-Honoré Fragonard** (1732-1806): A Rococo painter, Fragonard is famous for his delicate, intimate works that capture the essence of 18th-century French life. His sensual landscapes and exquisite portraits are highly regarded.\n",
+      "\n",
+      "These artists have made significant contributions to the history of French art, and their works continue to inspire and awe audiences around the world.\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for a user message\n",
+    "#\n",
+    "\n",
+    "chat_response = client.chat.completions.create(\n",
+    "    model=llama31_depl.name,\n",
+    "    messages=[\n",
+    "        {\"role\": \"user\", \"content\": \"Who is the best French painter. Answer with a short explanations.\"},\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "print(chat_response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "2a26c42a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 15:02:02,872 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "However, I'm a large language model, I don't have real-time access to current weather conditions. Nevertheless, I can suggest some options to find the current temperature in Dallas, Texas:\n",
+      "\n",
+      "1. Check online weather websites: You can visit websites like weather.com, accuweather.com, or wunderground.com to get the current temperature in Dallas.\n",
+      "2. Use a virtual assistant: You can ask virtual assistants like Siri, Google Assistant, or Alexa to provide you with the current temperature in Dallas.\n",
+      "3. Check a weather app: You can download a weather app on your smartphone to get the current temperature in Dallas.\n",
+      "\n",
+      "If you'd like, I can provide you with the average temperature ranges for Dallas during different times of the year.\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for list of messages\n",
+    "#\n",
+    "\n",
+    "chat_response = client.chat.completions.create(\n",
+    "    model=llama31_depl.name,\n",
+    "    messages=[{\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": \"Hi! How are you doing today?\"\n",
+    "    }, {\n",
+    "        \"role\": \"assistant\",\n",
+    "        \"content\": \"I'm doing well! How can I help you?\",\n",
+    "    }, {\n",
+    "        \"role\": \"user\",\n",
+    "         \"content\": \"Can you tell me what the temperate will be in Dallas, in fahrenheit?\"\n",
+    "    }]\n",
+    ")\n",
+    "\n",
+    "print(chat_response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "af680e7d",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/integrations/vllm/llama-3.1-8b-instruct/llama_predictor.py b/integrations/vllm/llama-3.1-8b-instruct/llama_predictor.py
new file mode 100644
index 00000000..ef9f860e
--- /dev/null
+++ b/integrations/vllm/llama-3.1-8b-instruct/llama_predictor.py
@@ -0,0 +1,45 @@
+import os
+import torch
+from vllm import __version__, AsyncEngineArgs, AsyncLLMEngine
+
+
+class Predictor:
+
+    def __init__(self):
+        print("Using vLLM version: " + str(__version__))
+        
+        # Load the configuration for the vLLM engine from the configuration file, if any
+        if "CONFIG_FILE_PATH" in os.environ and os.path.exists(os.environ["CONFIG_FILE_PATH"]):
+            print("Reading engine config from file...")
+            
+            import yaml
+            with open(os.environ["CONFIG_FILE_PATH"], 'r') as f:
+                config = yaml.load(f, Loader=yaml.SafeLoader)
+                self._disable_log_stats(config)
+        else:
+            print("Configuration file not found, defaulting to hard-coded engine config...")
+            config = {
+                # reduce resources need
+                "dtype": "half",
+                "max_model_len": 2048,
+                "gpu_memory_utilization": 0.96,
+                # disable logging stats and requests
+                "disable_log_stats": True,
+                "disable_log_requests": True,
+            }
+
+        print("Starting vLLM backend...")
+        engine_args = AsyncEngineArgs(
+            model=os.environ["MODEL_FILES_PATH"],
+            **config
+        )
+        if torch.cuda.is_available():
+            # adjust tensor parallel size
+            engine_args.tensor_parallel_size = torch.cuda.device_count()
+
+        # "self.vllm_engine" is required as the local variable with the vllm engine handler
+        self.vllm_engine = AsyncLLMEngine.from_engine_args(engine_args)
+        
+    def _disable_log_stats(self, config):
+        config["disable_log_stats"] = True
+        config["disable_log_requests"] = True
\ No newline at end of file
diff --git a/integrations/vllm/llama-3.1-8b-instruct/llama_vllmconfig.yaml b/integrations/vllm/llama-3.1-8b-instruct/llama_vllmconfig.yaml
new file mode 100644
index 00000000..3a6b011a
--- /dev/null
+++ b/integrations/vllm/llama-3.1-8b-instruct/llama_vllmconfig.yaml
@@ -0,0 +1,3 @@
+dtype: "half"
+max_model_len: 2048
+gpu_memory_utilization: 0.96
\ No newline at end of file
diff --git a/integrations/vllm/mistral-nemo-8b-2407/deploy_mistral-nemo-instruct-2407.ipynb b/integrations/vllm/mistral-nemo-8b-2407/deploy_mistral-nemo-instruct-2407.ipynb
new file mode 100644
index 00000000..3aa2e29c
--- /dev/null
+++ b/integrations/vllm/mistral-nemo-8b-2407/deploy_mistral-nemo-instruct-2407.ipynb
@@ -0,0 +1,981 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "03fa5b11",
+   "metadata": {},
+   "source": [
+    "# A Guide for Mistral NeMo Instruct-2407 on Hopsworks\n",
+    "\n",
+    "For details about this Large Language Model (LLM) visit the model page in the HuggingFace repository ➡️ [link](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed1b67c8",
+   "metadata": {},
+   "source": [
+    "### 1️⃣ Download Mistral NeMo Instruct-2407 using the huggingface_hub library\n",
+    "\n",
+    "First, we download the Mistral model files (e.g., weights, configuration files) directly from the HuggingFace repository.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "05f39053",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install huggingface_hub --quiet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "bfac70ba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Place your HuggingFace token in the HF_TOKEN environment variable\n",
+    "\n",
+    "import os\n",
+    "os.environ[\"HF_TOKEN\"] = \"<INSERT_YOUR_HF_TOKEN>\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "4358db98",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a93d0686d2b34f46a215778fa0a216c1",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Fetching 17 files:   0%|          | 0/17 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "d12db3ba734144d5b88025e857ee29a9",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "762f4404133a46938cda21f6aa14caf8",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00001-of-00005.safetensors:   0%|          | 0.00/4.87G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "0ddefd61d34244c481277c797a9c5d66",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "merges.txt:   0%|          | 0.00/3.13M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "9471005d25eb4a43841675287fc44b14",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00003-of-00005.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "94764d50b368448a9d5895562410293c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00002-of-00005.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "0bf9029db4ea431284b1217c50665e37",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00004-of-00005.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "8619c8ef71ce4c28a4eb84b278c6b02e",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "67269b01d868469ca4ced195fad5ede4",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "config.json:   0%|          | 0.00/622 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "b3d9831af87c4e1e9c5499ba9a5c8e8f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       ".gitattributes:   0%|          | 0.00/1.57k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3cf1e6906da8480096a3253435f60411",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "params.json:   0%|          | 0.00/204 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "d4b156f17ea144229d95d15a47b34a43",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model.safetensors.index.json:   0%|          | 0.00/29.9k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "5985bbeb6dc24cb785a04fe0ab24a0d2",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00005-of-00005.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "da3959c0d82b4153929408f70481c67a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "11e110428ffe44218ab8dce87b615308",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer.json:   0%|          | 0.00/9.26M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "4e7d6f119c724bb488102404db0a4669",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer_config.json:   0%|          | 0.00/181k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "df05c7ea3afd403b897991800d8b1bcf",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "vocab.json:   0%|          | 0.00/2.47M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "be85325ed64a48ac87a74700d58faa7b",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tekken.json:   0%|          | 0.00/14.8M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "from huggingface_hub import snapshot_download\n",
+    "\n",
+    "mistral_nemo_local_dir = snapshot_download(\"mistralai/Mistral-NeMo-Instruct-2407\", ignore_patterns=[\"consolidated.safetensors\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06633928",
+   "metadata": {},
+   "source": [
+    "## 2️⃣ Register Mistral NeMo Instruct-2407 into Hopsworks Model Registry\n",
+    "\n",
+    "Once the model files are downloaded from the HuggingFace repository, we can register the models files into the Hopsworks Model Registry."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "2784c9e6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 10:18:17,669 INFO: Python Engine initialized.\n",
+      "\n",
+      "Logged in to project, explore it here https://hopsworks.ai.local/p/119\n"
+     ]
+    }
+   ],
+   "source": [
+    "import hopsworks\n",
+    "\n",
+    "project = hopsworks.login()\n",
+    "mr = project.get_model_registry()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "9ce42ba8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The following instantiates a Hopsworks LLM model, not yet saved in the Model Registry\n",
+    "\n",
+    "mistral_nemo = mr.llm.create_model(\n",
+    "    name=\"mistral_nemo_instruct\",\n",
+    "    description=\"Mistral NeMo Instruct-2407 model (via HF)\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "57c8c907",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a66727a3269e4c7f8d6119013ca9c676",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/6 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model created, explore it at https://hopsworks.ai.local/p/119/models/mistral_nemo_instruct/1\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Model(name: 'mistral_nemo_instruct', version: 1)"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Register the Mistral model pointing to the local model files\n",
+    "\n",
+    "mistral_nemo.save(mistral_nemo_local_dir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "55ee2d57",
+   "metadata": {},
+   "source": [
+    "## 3️⃣ Deploy Mistral NeMo Instruct-2407\n",
+    "\n",
+    "After registering the LLM model into the Model Registry, we can create a deployment that serves it using the vLLM engine.\n",
+    "\n",
+    "Hopsworks provides two types of deployments to serve LLMs with the vLLM engine:\n",
+    "\n",
+    "- **Using the official vLLM OpenAI server**: an OpenAI API-compatible server implemented by the creators of vLLM where the vLLM engine is configured with a user-provided configuration (yaml) file.\n",
+    "\n",
+    "- **Using the KServe built-in vLLM server**: a KServe-based implementation of an OpenAI API-compatible server for more advanced users who need to provide a predictor script for the initialization of the vLLM engine and (optionally) the implementation of the *completions* and *chat/completions* endpoints.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "d63c3836",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 10:25:46,748 WARNING: VersionWarning: No version provided for getting model `mistral_nemo_instruct`, defaulting to `1`.\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Get a reference to the Mistral model if not obtained yet\n",
+    "\n",
+    "mistral_nemo = mr.get_model(\"mistral_nemo_instruct\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "5b1c0acc",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "51161f401401409da93d516e8e652d8a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading: 0.000%|          | 0/184 elapsed<00:00 remaining<?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "8d0a99c8e6114e4fa515ff7e85cf8de8",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading: 0.000%|          | 0/4960 elapsed<00:00 remaining<?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# Upload vllm engine config file for the deployments\n",
+    "\n",
+    "ds_api = project.get_dataset_api()\n",
+    "\n",
+    "path_to_config_file = f\"/Projects/{project.name}/\" + ds_api.upload(\"mistral_vllmconfig.yaml\", \"Resources\", overwrite=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d97e4f2a",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using KServe vLLM server\n",
+    "\n",
+    "Create a model deployment by providing a predictor script and (optionally) a configuration file with the arguments for the vLLM engine."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fdedf57e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "645e5a010a2f4b13914ea6bcfaf65557",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Uploading: 0.000%|          | 0/4960 elapsed<00:00 remaining<?"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Deployment created, explore it at https://hopsworks.ai.local/p/119/deployments/32\n",
+      "Before making predictions, start the deployment by using `.start()`\n"
+     ]
+    }
+   ],
+   "source": [
+    "# upload predictor script\n",
+    "path_to_predictor_script = f\"/Projects/{project.name}/\" + ds_api.upload(\"mistral_predictor.py\", \"Resources\", overwrite=True)\n",
+    "\n",
+    "mistral_depl = mistral_nemo.deploy(\n",
+    "    name=\"mistralnemo1\",\n",
+    "    description=\"Mistral NeMo Instruct-2407 from HuggingFace\", \n",
+    "    script_file=path_to_predictor_script,\n",
+    "    config_file=path_to_config_file,\n",
+    "    resources={\"num_instances\": 1, \"requests\": {\"cores\": 2, \"memory\": 1024*16, \"gpus\": 1}},\n",
+    "    environment=\"vllm-inference-pipeline-066\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b072400",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using vLLM OpenAI server\n",
+    "\n",
+    "Create a model deployment by providing a configuration file with the arguments for the vLLM engine."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "f138d2de",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Deployment created, explore it at https://hopsworks.ai.local/p/119/deployments/33\n",
+      "Before making predictions, start the deployment by using `.start()`\n"
+     ]
+    }
+   ],
+   "source": [
+    "mistral_depl = mistral_nemo.deploy(\n",
+    "    name=\"mistralnemo2\",\n",
+    "    description=\"Mistral NeMo Instruct-2407 from HuggingFace\",\n",
+    "    config_file=path_to_config_file,\n",
+    "    resources={\"num_instances\": 1, \"requests\": {\"cores\": 2, \"memory\": 1024*12, \"gpus\": 1}},\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d050895c",
+   "metadata": {},
+   "source": [
+    "---"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "1dc98d04",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Retrieve one of the deployments created above\n",
+    "\n",
+    "ms = project.get_model_serving()\n",
+    "mistral_depl = ms.get_deployment(\"mistralnemo2\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "1b60d580",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "961838549937455f96422adf8b55eb2a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/5 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Start making predictions by using `.predict()`\n"
+     ]
+    }
+   ],
+   "source": [
+    "mistral_depl.start(await_running=60*15) # wait for 15 minutes maximum"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c314ace9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# mistral_depl.stop()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "e76c0aaa",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "PredictorState(status: 'Running')"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "mistral_depl.get_state()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "844c04f8",
+   "metadata": {},
+   "source": [
+    "## 4️⃣ Prompting Mistral NeMo Instruct-2407\n",
+    "\n",
+    "Once the Mistral deployment is up and running, we can start sending user prompts to the LLM. You can either use an OpenAI API-compatible client (e.g., openai library) or any other http client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "0e4ffe49",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Get the istio endpoint from the Mistral deployment page in the Hopsworks UI.\n",
+    "istio_endpoint = \"<ISTIO_ENDPOINT>\" # with format \"http://<ip-address>\"\n",
+    "\n",
+    "# Resolve base uri. NOTE: KServe's vLLM server prepends the URIs with /openai\n",
+    "base_uri = \"/openai\" if mistral_depl.predictor.script_file is not None else \"\"\n",
+    "\n",
+    "openai_v1_uri = istio_endpoint + base_uri + \"/v1\"\n",
+    "completions_url = openai_v1_uri + \"/completions\" \n",
+    "chat_completions_url = openai_v1_uri + \"/chat/completions\"\n",
+    "\n",
+    "# Resolve API key for request authentication\n",
+    "if \"SERVING_API_KEY\" in os.environ:\n",
+    "    # if running inside Hopsworks\n",
+    "    api_key_value = os.environ[\"SERVING_API_KEY\"]\n",
+    "else:\n",
+    "    # Create an API KEY using the Hopsworks UI and place the value below\n",
+    "    api_key_value = \"<API_KEY>\"\n",
+    "    \n",
+    "# Prepare request headers\n",
+    "headers = {\n",
+    "    'Content-Type': 'application/json',\n",
+    "    'Authorization': 'ApiKey ' + api_key_value,\n",
+    "    'Host': f\"{mistral_depl.name}.{project.name.lower().replace('_', '-')}.hopsworks.ai\", # also provided in the Hopsworks UI\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23fd9d4d",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using httpx"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "4fb1e923",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import httpx"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "fd7f4bf1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Completion request:  {'model': 'mistralnemo2', 'messages': [{'role': 'user', 'content': 'Who is the best French painter. Answer with detailed explanations.'}]}\n",
+      "2025-01-27 13:18:42,979 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "Choosing the \"best\" French painter can be subjective and depends on personal taste, as well as the specific criteria used for judgment. However, several French painters have made significant contributions to the art world and have left lasting impacts on Western art history. Here, I'll provide detailed explanations of three major French painters often considered among the best:\n",
+      "\n",
+      "1. **Claude Monet (1840-1926)** - A founding member of the Impressionist movement, Monet is renowned for his mastery of visible light, his ability to depict the changing effects of light, and his innovative techniques. Here are some reasons why he's considered one of the best:\n",
+      "\n",
+      "   - **Influence on Impressionism**: Monet is often considered the fathers of Impressionism, giving birth to the movement with his water lilies series. His groundbreaking techniques and dedication to capturing the transient effects of light and color paved the way for other Impressionists.\n",
+      "\n",
+      "   - **Mastery of Light and Color**: Monet was highly skilled in conveying the ever-changing effects of light and color on the world around him. His direct observation of nature and his ability to capture the ephemeral Quality of light and color realistically distinguish him as one of the greatest painters of his time.\n",
+      "\n",
+      "   - **Artistic Evolution**: Throughout his career, Monet continually challenged himself and pushed the boundaries of his artistic style. His later works, particularly his famed water lilies series, showcased an increased focus on abstraction, demonstrating his constant pursuit of artistic growth and innovation.\n",
+      "\n",
+      "2. **Pierre-Auguste Renoir (1841-1919)** - Another key figure in the Impressionist movement, Renoir is celebrated for his ability to evoke the warm, sensuous atmosphere of late 19th-century French life. Here's why he's considered one of the best:\n",
+      "\n",
+      "   - **Mastery of Color and Brushwork**: Renoir's work is characterized by its vibrant colors, visible brush strokes, and beautiful lighting. He had an unmatched ability to capture the essence of a moment, imbuing his paintings with life, movement, and emotion.\n",
+      "\n",
+      "   - **Celebration of Beauty**: Unlike many of his Impressionist contemporaries, Renoir often depicted joyous, idyllic scenes of middle-class life and leisure. His works celebrate the beauty and pleasures of everyday existence, making them both endearing and accessible.\n",
+      "\n",
+      "   - **Influence on Expressionism and Fauvism**: Renoir's emphasis on color and form had a significant impact on later artistic movements, most notably Expressionism and Fauvism. His loose, vibrant brushwork and bold use of color laid the groundwork for these crucial developments in modern art.\n",
+      "\n",
+      "3. **Eugene Delacroix (1798-1863)** - As a crucial figure in French Romanticism, Delacroix is recognized for his passionate, emotionally charged art and his pioneering role in the development of modern European painting. Here's why he stands out as one of the best French painters:\n",
+      "\n",
+      "   - **Revival of Color and Interest in the Orient**: Delacroix was instrumental in reviving color and interest in Eastern art within Western painting. He traveled to North Africa in 1832, and his exposure to Oriental culture and art had a profound impact on his work.\n",
+      "\n",
+      "   - **Emotional Intensity**: Delacroix's paintings are renowned for their emotional power and dramatic use of color and line. He often explored themes of passion, eroticism, and the exotic, creating works that were both captivating and controversial.\n",
+      "\n",
+      "   - **Influence on Modern Art**: Delacroix's innovative use of color, his expressive brushstrokes, and his interest in non-Western art made him a major influence on both the Impressionists and the Post-Impressionists, including Monet, Renoir, and Vincent van Gogh. His work laid the foundation for many of the artistic developments of the 19th and 20th centuries.\n",
+      "\n",
+      "Each of these painters made significant contributions to French art and left lasting legacies that continue to inspire artists today. Ultimately, the \"best\" French painter depends on personal preference, as each of these masters offers a distinct and valuable perspective on art history.\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for a user message\n",
+    "# \n",
+    "\n",
+    "user_message = \"Who is the best French painter. Answer with detailed explanations.\"\n",
+    "\n",
+    "completion_request = {\n",
+    "    \"model\": mistral_depl.name,\n",
+    "    \"messages\": [\n",
+    "        {\n",
+    "            \"role\": \"user\",\n",
+    "            \"content\": user_message\n",
+    "        }\n",
+    "    ]\n",
+    "}\n",
+    "\n",
+    "print(\"Completion request: \", completion_request, end=\"\\n\")\n",
+    "\n",
+    "response = httpx.post(chat_completions_url, headers=headers, json=completion_request, timeout=45.0)\n",
+    "\n",
+    "print(response.json()[\"choices\"][0][\"message\"][\"content\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "05582a38",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Completion request:  {'model': 'mistralnemo2', 'messages': [{'role': 'user', 'content': 'Hi! How are you doing today?'}, {'role': 'assistant', 'content': \"I'm doing well! How can I help you\"}, {'role': 'user', 'content': 'Can you tell me what the temperate will be in Dallas, in fahrenheit?'}]}\n",
+      "2025-01-27 13:18:44,790 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "Sure! According to the latest forecast, the temperature in Dallas, TX today will be around 75°F (24°C) during the day, with a low of 57°F (14°C) overnight.\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for list of messages\n",
+    "#\n",
+    "\n",
+    "messages = [\n",
+    "{\n",
+    "    \"role\": \"user\",\n",
+    "    \"content\": \"Hi! How are you doing today?\"\n",
+    "}, {\n",
+    "    \"role\": \"assistant\",\n",
+    "    \"content\": \"I'm doing well! How can I help you\",\n",
+    "}, {\n",
+    "    \"role\": \"user\",\n",
+    "    \"content\": \"Can you tell me what the temperate will be in Dallas, in fahrenheit?\"\n",
+    "}]\n",
+    "\n",
+    "completion_request = {\n",
+    "    \"model\": mistral_depl.name,\n",
+    "    \"messages\": messages\n",
+    "}\n",
+    "\n",
+    "print(\"Completion request: \", completion_request, end=\"\\n\")\n",
+    "\n",
+    "response = httpx.post(chat_completions_url, headers=headers, json=completion_request, timeout=45.0)\n",
+    "\n",
+    "print(response.json()[\"choices\"][0][\"message\"][\"content\"])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0122214",
+   "metadata": {},
+   "source": [
+    "### 🟨 Using OpenAI client"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "4ef7366d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install openai --quiet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "d46ab32e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from openai import OpenAI"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "0ed5ac1f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = OpenAI(\n",
+    "    base_url=openai_v1_uri,\n",
+    "    api_key=\"X\",\n",
+    "    default_headers=headers\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "f7c774d1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 13:19:12,434 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "Choosing the \"best\" French painter can be subjective and depends on the criteria you value most: influence, technical skill, innovation, historical significance, or simply personal preference. However, several names frequently rise to the top of these discussions due to their profound impact on art history. Here are a few notable French painters, each with detailed explanations:\n",
+      "\n",
+      "1. **Leonardo da Vinci (1452-1519)**: Although not exclusively French, Leonardo spent the final 18 years of his life in France, working for King Francis I. His influence on French art was immense, and his artistic legacy is widely acknowledged. Leonardo's mastery of sfumato, his incredible anatomical understanding, and his groundbreaking compositions have inspired countless artists. His most famous works, such as the \"Mona Lisa\" and \"The Last Supper,\" are iconic symbols of Western art.\n",
+      "\n",
+      "2. **Jacques-Louis David (1748-1825)**: David was a key figure of the Neoclassical style, influential throughout Europe. He played a significant role in the French Revolution, creating powerful, dramatic images that inspired political change. David's works, like \"The Death of Marat\" and \"Oath of the Horatii,\" are renowned for their clarity, symmetry, and emotional impact. His inaccuracies in depiction have led to some controversies, but his influence on Romanticism and Realism is undeniable.\n",
+      "\n",
+      "3. **Eugène Delacroix (1798-1863)**: Delacroix is considered one of the founders of the French Romantic school. His expressive, emotional style marked a departure from the more rational Neoclassicism. Delacroix's bold, vibrant colors and dynamic brushwork can be seen in works like \"Liberty Leading the People\" and \"The Death of Sardanapalus.\" His influence can be seen in various 19th-century movements, from Realism to Impressionism and beyond.\n",
+      "\n",
+      "4. **Claude Monet (1840-1926)**: As a founding member of the Impressionist movement, Monet is celebrated for his innovative techniques and dedication to capturing fleeting moments in time and the ever-changing effects of light. Monet's \"Water Lilies\" series, made up of around 250 paintings, is one of the most iconic bodies of work in art history. His influence on modern art, particularly in the realm of landscape painting and abstraction, is profound.\n",
+      "\n",
+      "5. **Pablo Picasso (1881-1973)**: Though Spanish, Picasso spent most of his adult life in France, making him one of the most important French painters of the 20th century. He co-founded Cubism and played a crucial role in shaping numerous other artistic movements, including Expressionism, Surrealism, and abstract art. Picasso's prolific output, spanning 80 years, includes masterpieces like \"Les Demoiselles d'Avignon\" and \"Guernica,\" making him one of the most influential artists ever.\n",
+      "\n",
+      "Each of these artists has made an indelible mark on art history, and deciding who is the \"best\" largely depends on individual interpretation and personal taste.\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for a user message\n",
+    "#\n",
+    "\n",
+    "chat_response = client.chat.completions.create(\n",
+    "    model=mistral_depl.name,\n",
+    "    messages=[\n",
+    "        {\"role\": \"user\", \"content\": \"Who is the best French painter. Answer with detailed explanations.\"},\n",
+    "    ]\n",
+    ")\n",
+    "\n",
+    "print(chat_response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "2b850098",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "2025-01-27 13:19:13,569 INFO: HTTP Request: POST http://51.89.4.22/v1/chat/completions \"HTTP/1.1 200 OK\"\n",
+      "Sure! According to the latest forecast, the temperature in Dallas, Texas will be around 75°F (24°C) today.\n"
+     ]
+    }
+   ],
+   "source": [
+    "#\n",
+    "# Chat Completion for list of messages\n",
+    "#\n",
+    "\n",
+    "chat_response = client.chat.completions.create(\n",
+    "    model=mistral_depl.name,\n",
+    "    messages=[{\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": \"Hi! How are you doing today?\"\n",
+    "    }, {\n",
+    "        \"role\": \"assistant\",\n",
+    "        \"content\": \"I'm doing well! How can I help you?\",\n",
+    "    }, {\n",
+    "        \"role\": \"user\",\n",
+    "         \"content\": \"Can you tell me what the temperate will be in Dallas, in fahrenheit?\"\n",
+    "    }]\n",
+    ")\n",
+    "\n",
+    "print(chat_response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2b5dda5e",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/integrations/vllm/mistral-nemo-8b-2407/mistral_predictor.py b/integrations/vllm/mistral-nemo-8b-2407/mistral_predictor.py
new file mode 100644
index 00000000..ba3055e7
--- /dev/null
+++ b/integrations/vllm/mistral-nemo-8b-2407/mistral_predictor.py
@@ -0,0 +1,116 @@
+import os
+from typing import Iterable, Dict, Any, cast, Optional
+
+import torch
+
+from vllm import __version__, AsyncEngineArgs, AsyncLLMEngine
+from vllm.transformers_utils.tokenizers import maybe_serialize_tool_calls
+
+from kserve.protocol.rest.openai import (
+    ChatPrompt,
+    ChatCompletionRequestMessage,
+)
+from kserve.protocol.rest.openai.types.openapi import ChatCompletionTool
+
+from mistral_common.protocol.instruct.request import ChatCompletionRequest
+
+
+class Predictor:
+
+    def __init__(self):
+        print("Using vLLM version: " + str(__version__))
+        
+        # Load the configuration for the vLLM engine from the configuration file, if any
+        if "CONFIG_FILE_PATH" in os.environ and os.path.exists(os.environ["CONFIG_FILE_PATH"]):
+            print("Reading engine config from file...")
+            
+            import yaml
+            with open(os.environ["CONFIG_FILE_PATH"], 'r') as f:
+                config = yaml.load(f, Loader=yaml.SafeLoader)
+                self._drop_unsupported_engine_args(config)
+                self._disable_log_stats(config)
+        else:
+            print("Configuration file not found, defaulting to hard-coded engine config...")
+            config = {
+                "tokenizer_mode": "mistral",
+                # reduce resources need
+                "dtype": "half",
+                "max_model_len": 20720,
+                "gpu_memory_utilization": 0.96,
+                # disable logging stats and requests
+                "disable_log_stats": True,
+                "disable_log_requests": True,
+            }
+
+        print("Starting vLLM backend...")
+        engine_args = AsyncEngineArgs(
+            model=os.environ["MODEL_FILES_PATH"],
+            **config
+        )
+        if torch.cuda.is_available():
+            # adjust tensor parallel size
+            engine_args.tensor_parallel_size = torch.cuda.device_count()
+
+        # "self.vllm_engine" is required as the local variable with the vllm engine handler
+        self.vllm_engine = AsyncLLMEngine.from_engine_args(engine_args)
+    
+    def apply_chat_template(
+        self,
+        messages: Iterable[ChatCompletionRequestMessage],
+        chat_template: Optional[str] = None,
+        tools: Optional[list[ChatCompletionTool]] = None,
+    ) -> ChatPrompt:
+        """Converts a prompt or list of messages into a single templated prompt string"""
+        
+        tool_dicts=[tool.model_dump() for tool in tools] if tools else None
+        parsed_messages = self._parse_messages(messages)
+        request = ChatCompletionRequest(messages=parsed_messages,
+                                        tools=tool_dicts)
+
+        encoded = self.tokenizer.encode_chat_completion(request)
+        
+        return ChatPrompt(prompt=encoded.text)
+    
+    def _parse_messages(self, messages):
+        # The Mistral tokenizer expects a slightly different format of messages.
+        # https://github.com/mistralai/mistral-common/blob/21ee9f6cee3441e9bb1e6ed2d10173f90bd9b94b/src/mistral_common/protocol/instruct/request.py#L21
+        # e.g., we need to remove the 'name' field from the messages.
+        parsed_messages = []
+        for msg in messages:
+            parsed_msg = vars(msg)
+            del parsed_msg["name"]  # name field is not accepted by mistral tokenizer
+            if "function_call" in parsed_msg:
+                del parsed_msg["function_call"]  # function_call field is not accepted by mistral tokenizer
+            if "tool_calls" in parsed_msg and parsed_msg["tool_calls"] is None:
+                del parsed_msg["tool_calls"]  # vllm mistral tokenizer wrapper doesn't allow None tool_calls
+            parsed_messages.append(parsed_msg)
+        
+        last_message = cast(Dict[str, Any], parsed_messages[-1])
+        if last_message["role"] == "assistant":
+            last_message["prefix"] = True
+            
+        # NOTE from vllm:
+        # because of issues with pydantic we need to potentially
+        # re-serialize the tool_calls field of the request
+        # for more info: see comment in `maybe_serialize_tool_calls`
+        class MessagesWrapper:
+            def __init__(self, messages):
+                self.messages = messages
+        messages_wrapper = MessagesWrapper(parsed_messages)
+        maybe_serialize_tool_calls(messages_wrapper)
+        parsed_messages = messages_wrapper.messages
+        
+        return parsed_messages
+    
+    def _drop_unsupported_engine_args(self, config):
+        # The following arguments are supported by the vllm-openai server, not the vllm engine itself.
+        if "enable_auto_tool_choice" in config:
+            del config["enable_auto_tool_choice"]
+        if "tool_call_parser" in config:
+            del config["tool_call_parser"]
+        if "chat_template" in config:
+            del config["chat_template"]
+            
+    def _disable_log_stats(self, config):
+        config["disable_log_stats"] = True
+        config["disable_log_requests"] = True
\ No newline at end of file
diff --git a/integrations/vllm/mistral-nemo-8b-2407/mistral_vllmconfig.yaml b/integrations/vllm/mistral-nemo-8b-2407/mistral_vllmconfig.yaml
new file mode 100644
index 00000000..2f286006
--- /dev/null
+++ b/integrations/vllm/mistral-nemo-8b-2407/mistral_vllmconfig.yaml
@@ -0,0 +1,7 @@
+tokenizer_mode: "mistral"
+dtype: "half"
+max_model_len: 20184
+gpu_memory_utilization: 0.96
+# only for vllm-openai model server
+enable_auto_tool_choice: true
+tool_call_parser: "mistral"