MicrosoftDocs
diff --git a/‎.openpublishing.publish.config.json
Lines changed: 6 additions & 0 deletions b/‎.openpublishing.publish.config.json
Lines changed: 6 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/breadcrumb/toc.yml
Lines changed: 11 additions & 0 deletions b/‎articles/ai-foundry/model-inference/breadcrumb/toc.yml
Lines changed: 11 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/content-filter.md
Lines changed: 309 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/content-filter.md
Lines changed: 309 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/default-safety-policies.md
Lines changed: 81 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/default-safety-policies.md
Lines changed: 81 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 55 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 55 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 87 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 87 additions & 0 deletions
@@ -176,6 +176,12 @@
       "branch": "main",
       "branch_mapping": {}
     },
+    {
+      "path_to_root": "azureai-model-inference-bicep",
+      "url": "https://github.com/Azure-Samples/azureai-model-inference-bicep",
+      "branch": "main",
+      "branch_mapping": {}
+    },
     {
       "path_to_root": "azure-docs-pr-policy-includes",
       "url": "https://github.com/MicrosoftDocs/azure-docs-pr",
 
@@ -0,0 +1,11 @@
+- name: Azure
+  tocHref: /azure/
+  topicHref: /azure/index
+  items:
+  - name: Azure AI services
+    tocHref: /azure/ai-services/
+    topicHref: /azure/ai-services/index
+    items:
+    - name: Azure AI models in Azure AI Services
+      tocHref: /azure/ai-services/
+      topicHref: /azure/ai-services/model-inference/index
@@ -0,0 +1,81 @@
+---
+title: Default content safety policies for Azure AI Model Inference
+titleSuffix: Azure AI Foundry
+description: Learn about the default content safety policies that Azure AI Model Inference uses to flag content.
+author: PatrickFarley
+ms.author: fasantia
+ms.service: azure-ai-model-inference
+ms.topic: conceptual 
+ms.date: 07/15/2024
+manager: nitinme
+---
+
+# Default content safety policies for Azure AI Model Inference
+
+Azure AI model inference includes default safety applied to all models, excluding Azure OpenAI Whisper. These configurations provide you with a responsible experience by default.
+
+Default safety aims to mitigate risks such as hate and fairness, sexual, violence, self-harm, protected material content, and user prompt injection attacks. To learn more about content filtering, read [our documentation describing categories and severity levels](content-filter.md).
+
+This document describes the default configuration.
+
+> [!TIP]
+> By default, all model deployments use the default configuration. However, you can configure content filtering per model deployment as explained at [Configuring content filtering](../how-to/configure-content-filters.md).
+
+## Text models
+
+Text models in Azure AI model inference can take in and generate both text and code. These models apply Azure's text content filtering models to detect and prevent harmful content. This system works on both prompt and completion. 
+
+| Risk Category                             | Prompt/Completion      | Severity Threshold  |
+|-------------------------------------------|------------------------|---------------------|
+| Hate and Fairness                         | Prompts and Completions| Medium              |
+| Violence                                  | Prompts and Completions| Medium              |
+| Sexual                                    | Prompts and Completions| Medium              |
+| Self-Harm                                 | Prompts and Completions| Medium              |
+| User prompt injection attack (Jailbreak)  | Prompts                | N/A                 |
+| Protected Material – Text                 | Completions            | N/A                 |
+| Protected Material – Code                 | Completions            | N/A                 |
+
+## Vision and chat with vision models
+
+Vision models can take both text and images at the same time as part of the input. Default content filtering capabilities vary per model and provider.
+
+### Azure OpenAI: GPT-4o and GPT-4 Turbo
+
+| Risk Category                                                       | Prompt/Completion      | Severity Threshold |
+|---------------------------------------------------------------------|------------------------|---------------------|
+| Hate and Fairness                                                   | Prompts and Completions| Medium              |
+| Violence                                                            | Prompts and Completions| Medium              |
+| Sexual                                                              | Prompts and Completions| Medium              |
+| Self-Harm                                                           | Prompts and Completions| Medium              |
+| Identification of Individuals and Inference of Sensitive Attributes | Prompts                | N/A                 |
+| User prompt injection attack (Jailbreak)                            | Prompts                | N/A                 |
+
+### Azure OpenAI: DALL-E 3 and DALL-E 2
+
+| Risk Category                                     | Prompt/Completion      | Severity Threshold |
+|---------------------------------------------------|------------------------|---------------------|
+| Hate and Fairness                                 | Prompts and Completions| Low                 |
+| Violence                                          | Prompts and Completions| Low                 |
+| Sexual                                            | Prompts and Completions| Low                 |
+| Self-Harm                                         | Prompts and Completions| Low                 |
+| Content Credentials                               | Completions            | N/A                 |
+| Deceptive Generation of Political Candidates      | Prompts                | N/A                 |
+| Depictions of Public Figures                      | Prompts                | N/A                 |
+| User prompt injection attack (Jailbreak)          | Prompts                | N/A                 |
+| Protected Material – Art and Studio Characters    | Prompts                | N/A                 |
+| Profanity                                         | Prompts                | N/A                 |
+
+
+In addition to the previous safety configurations, Azure OpenAI DALL-E also comes with [prompt transformation](../../../ai-services/openai/concepts/prompt-transformation.md) by default. This transformation occurs on all prompts to enhance the safety of your original prompt, specifically in the risk categories of diversity, deceptive generation of political candidates, depictions of public figures, protected material, and others. 
+
+### Meta: Llama-3.2-11B-Vision-Instruct and Llama-3.2-90B-Vision-Instruct
+
+Content filters apply only to text prompts and completions. Images aren't subject to content moderation.
+
+### Microsoft: Phi-3.5-vision-instruct
+
+Content filters apply only to text prompts and completions. Images aren't subject to content moderation.
+
+## Next steps
+
+* [Configure content filters in Azure AI Model Inference](../how-to/configure-content-filters.md)
@@ -0,0 +1,55 @@
+---
+title: Understanding deployment types in Azure AI model inference
+titleSuffix: Azure AI Foundry
+description: Learn how to use deployment types in Azure AI model deployments
+author: mrbullwinkle
+manager: nitinme
+ms.service: azure-ai-model-inference
+ms.topic: how-to
+ms.date: 10/11/2024
+ms.author: fasantia
+ms.custom: ignite-2024, github-universe-2024
+---
+
+# Deployment types in Azure AI model inference
+
+Azure AI model inference in Azure AI services provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
+
+All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you need to make two key decisions:
+
+- **Data residency needs**: global vs. regional resources  
+- **Call volume**: standard vs. provisioned
+
+Deployment types support varies by model and model provider. You can see which deployment type (SKU) each model supports in the [Models section](models.md). 
+
+## Global versus regional deployment types
+
+For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point. 
+
+Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer's inference requests. This means you get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
+
+Our global deployments are the first location for all new models and features. Customers with large throughput requirements should consider our provisioned deployment offering.
+
+## Standard
+
+Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region and throughput may be limited.  
+
+Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
+
+Only Azure OpenAI models support this deployment type.
+
+## Global standard
+
+Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request.  Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.  
+
+Customers with high consistent volume may experience greater latency variability. The threshold is set per model. For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput if available.
+
+## Global provisioned
+
+Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
+
+Only Azure OpenAI models support this deployment type.
+
+## Next steps
+
+- [Quotas & limits](../quotas-limits.md)
@@ -0,0 +1,87 @@
+---
+title: Model inference endpoint in Azure AI services
+titleSuffix: Azure AI Foundry
+description: Learn about the model inference endpoint in Azure AI services
+author: mrbullwinkle
+manager: nitinme
+ms.service: azure-ai-model-inference
+ms.topic: how-to
+ms.date: 10/11/2024
+ms.author: fasantia
+ms.custom: ignite-2024, github-universe-2024
+---
+
+# Model inference endpoint in Azure AI Services
+
+Azure AI model inference in Azure AI services allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
+
+The article explains how models are organized inside of the service and how to use the inference endpoint to invoke them.
+
+## Deployments
+
+Azure AI model inference makes models available using the **deployment** concept. **Deployments** are a way to give a model a name under certain configurations. Then, you can invoke such model configuration by indicating its name on your requests.
+
+Deployments capture:
+
+> [!div class="checklist"]
+> * A model name
+> * A model version
+> * A provisioning/capacity type<sup>1</sup>
+> * A content filtering configuration<sup>1</sup>
+> * A rate limiting configuration<sup>1</sup>
+
+<sup>1</sup> Configurations may vary depending on the selected model.
+
+An Azure AI services resource can have as many model deployments as needed and they don't incur in cost unless inference is performed for those models. Deployments are Azure resources and hence they're subject to Azure policies.
+
+To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md).
+
+## Azure AI inference endpoint
+
+The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md) which all the models in Azure AI model inference support.
+
+You can see the endpoint URL and credentials in the **Overview** section:
+
+:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="An screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
+
+### Routing
+
+The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
+
+:::image type="content" source="../media/endpoint/endpoint-routing.png" alt-text="An illustration showing how routing works for a Meta-llama-3.2-8b-instruct model by indicating such name in the parameter 'model' inside of the payload request." lightbox="../media/endpoint/endpoint-routing.png":::
+
+For example, if you create a deployment named `Mistral-large`, then such deployment can be invoked as:
+
+[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
+
+[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
+
+> [!TIP]
+> Deployment routing isn't case sensitive.
+
+### SDKs
+
+The Azure AI model inference endpoint is supported by multiple SDKs, including the **Azure AI Inference SDK**, the **Azure AI Foundry SDK**, and the **Azure OpenAI SDK**; which are available in multiple languages. Multiple integrations are also supported in popular frameworks like LangChain, LangGraph, Llama-Index, Semantic Kernel, and AG2. See [supported programming languages and SDKs](../supported-languages.md) for details.
+
+## Azure OpenAI inference endpoint
+
+Azure OpenAI models deployed to AI services also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference.
+
+Azure OpenAI inference endpoints work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
+
+:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
+
+Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
+
+> [!IMPORTANT]
+> There's no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment.
+
+### SDKs
+
+The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages. See [supported languages](../supported-languages.md#azure-openai-models) for details. 
+
+
+## Next steps
+
+- [Models](models.md)
+- [Deployment types](deployment-types.md)