You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+32Lines changed: 32 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -3,6 +3,38 @@ All notable changes to this project will be documented in this file.
3
3
4
4
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
5
5
6
+
7
+
## [0.7.0] - 2024-06-18
8
+
9
+
This release switches all examples to use cloud hosted GPU accelerated LLM and embedding models from [Nvidia API Catalog](https://build.nvidia.com) as default. It also deprecates support to deploy on-prem models using NeMo Inference Framework Container and adds support to deploy accelerated generative AI models across the cloud, data center, and workstation using [latest Nvidia NIM-LLM](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html).
10
+
11
+
### Added
12
+
- Added model [auto download and caching support for `nemo-retriever-embedding-microservice` and `nemo-retriever-reranking-microservice`](./deploy/compose/docker-compose-nim-ms.yaml). Updated steps to deploy the services can be found [here](https://nvidia.github.io/GenerativeAIExamples/latest/nim-llms.html).
13
+
-[Multimodal RAG Example enhancements](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)
14
+
- Moved to the [PDF Plumber library](https://pypi.org/project/pdfplumber/) for parsing text and images.
15
+
- Added `pgvector` vector DB support.
16
+
- Added support to ingest files with .pptx extension
17
+
- Improved accuracy of image parsing by using [tesseract-ocr](https://pypi.org/project/tesseract-ocr/)
18
+
- Added a [new notebook showcasing RAG usecase using accelerated NIM based on-prem deployed models](./notebooks/08_RAG_Langchain_with_Local_NIM.ipynb)
19
+
- Added a [new experimental example](./experimental/rag-developer-chatbot/) showcasing how to create a developer-focused RAG chatbot using RAPIDS cuDF source code and API documentation.
20
+
- Added a [new experimental example](./experimental/event-driven-rag-cve-analysis/) demonstrating how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines.
21
+
22
+
### Changed
23
+
- All examples now use llama3 models from [Nvidia API Catalog](https://build.nvidia.com/search?term=llama3) as default. Summary of updated examples and the model it uses is available [here](https://nvidia.github.io/GenerativeAIExamples/latest/index.html#developer-rag-examples).
24
+
- Switched default embedding model of all examples to [Snowflake arctic-embed-I model](https://build.nvidia.com/snowflake/arctic-embed-l)
25
+
- Added more verbose logs and support to configure [log level for chain server using LOG_LEVEL enviroment variable](https://nvidia.github.io/GenerativeAIExamples/latest/configuration.html#chain-server).
26
+
- Bumped up version of `langchain-nvidia-ai-endpoints`, `sentence-transformers` package and `milvus` containers
27
+
- Updated base containers to use ubuntu 22.04 image `nvcr.io/nvidia/base/ubuntu:22.04_20240212`
28
+
- Added `llama-index-readers-file` as dependency to avoid runtime package installation within chain server.
29
+
30
+
31
+
### Deprecated
32
+
- Deprecated support of on-prem LLM model deployment using [NeMo Inference Framework Container](https://github.com/NVIDIA/GenerativeAIExamples/blob/v0.6.0/deploy/compose/rag-app-text-chatbot.yaml#L2). Developers can use [Nvidia NIM-LLM to deploy TensorRT optimized models on-prem and plug them in with existing examples](https://nvidia.github.io/GenerativeAIExamples/latest/nim-llms.html).
-`nvolveqa_40k` embedding model was deprecated from [Nvidia API Catalog](https://build.nvidia.com). Updated all [notebooks](./notebooks/) and [experimental artifacts](./experimental/) to use [Nvidia embed-qa-4 model](https://build.nvidia.com/nvidia/embed-qa-4) instead.
35
+
- Removed [notebooks numbered 00-04](https://github.com/NVIDIA/GenerativeAIExamples/tree/v0.6.0/notebooks), which used on-prem LLM model deployment using deprecated [NeMo Inference Framework Container](https://github.com/NVIDIA/GenerativeAIExamples/blob/v0.6.0/deploy/compose/rag-app-text-chatbot.yaml#L2).
| llama-2 | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)]| No | Yes | No | No | FAISS |
38
-
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)]| No | No | Yes | Yes | Milvus or pgvector |
39
-
| mixtral_8x7b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)]| No | No | Yes | Yes | FAISS |
40
-
| mixtral_8x7b<br>Deplot<br>Neva-22b | ai-embed-qa-4 | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)]| No | No | Yes | No | Milvus or pvgector |
41
-
| llama-2 | UAE-Large-V1 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)]| Yes | Yes | No | Yes | Milvus or pgvector |
35
+
| llama3-70b | snowflake-arctic-embed-l | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)]| No | No | Yes | Yes | Milvus or pgvector |
| llama3-70b | snowflake-arctic-embed-l | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)]| No | No | Yes | Yes | Milvus or pgvector |
38
+
| llama3-70b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)]| No | No | Yes | Yes | FAISS |
39
+
| llama3-8b<br>Deplot<br>Neva-22b | snowflake-arctic-embed-l | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)]| No | No | Yes | No | Milvus or pvgector |
42
40
| llama3-70b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)]| No | No | Yes | No | none |
43
-
|llama-2|ai-embed-qa-4| LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)]| No | No | Yes | No | Milvus or pgvector |
41
+
|llama3-8b|snowflake-arctic-embed-l| LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)]| No | No | Yes | No | Milvus or pgvector |
44
42
45
43
### Enterprise RAG Examples
46
44
47
45
The enterprise RAG examples run as microservices distributed across multiple VMs and GPUs.
48
46
These examples show how to orchestrate RAG pipelines with [Kubernetes](https://kubernetes.io/) and deployed with [Helm](https://helm.sh/).
49
47
50
48
Enterprise RAG examples include a [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for LLM lifecycle management.
51
-
It is compatible with the [NVIDIA GPU operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
49
+
It is compatible with the [NVIDIA GPU Operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
52
50
53
51
Enterprise RAG examples also support local and remote inference with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).
| llama-2|NV-Embed-QA| LlamaIndex | Chat bot, Kubernetes deployment [[README](./docs/developer-llm-operator/)]| No | No | Yes | No | Yes | Milvus |
55
+
| llama-3|nv-embed-qa-4| LlamaIndex | Chat bot, Kubernetes deployment [[chart](https://registry.ngc.nvidia.com/orgs/ohlfw0olaadg/teams/ea-participants/helm-charts/rag-app-text-chatbot)]| No | No | Yes | No | Yes | Milvus |
58
56
59
57
60
58
### Generative AI Model Examples
@@ -89,6 +87,16 @@ These are open source connectors for NVIDIA-hosted and self-hosted API endpoints
89
87
|[NVIDIA Triton Inference Server](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton.html)|[LlamaIndex](https://www.llamaindex.ai/)|Yes|Yes|No|Triton inference server provides API access to hosted LLM models over gRPC. |
90
88
|[NVIDIA TensorRT-LLM](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt.html)|[LlamaIndex](https://www.llamaindex.ai/)|Yes|Yes|No|TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |
91
89
90
+
91
+
## Related NVIDIA RAG Projects
92
+
93
+
-[NVIDIA Tokkio LLM-RAG](https://docs.nvidia.com/ace/latest/workflows/tokkio/text/Tokkio_LLM_RAG_Bot.html): Use Tokkio to add avatar animation for RAG responses.
94
+
95
+
-[RAG on Windows using TensorRT-LLM and LlamaIndex](https://github.com/NVIDIA/ChatRTX): Create RAG chatbots on Windows using TensorRT-LLM.
96
+
97
+
-[Hybrid RAG Project on AI Workbench](https://github.com/NVIDIA/workbench-example-hybrid-rag): Run an NVIDIA AI Workbench example project for RAG.
98
+
99
+
92
100
## Support, Feedback, and Contributing
93
101
94
102
We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback.
0 commit comments