Skip to content

Commit b43e8b0

Browse files
authored
Upstream changes for v0.7.0 release (#134)
Signed-off-by: Shubhadeep Das <[email protected]>
1 parent e711143 commit b43e8b0

File tree

350 files changed

+11253
-24624
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

350 files changed

+11253
-24624
lines changed

.dockerignore

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Ignore git objects
2+
.git/
3+
.gitignore
4+
.gitlab-ci.yml
5+
.gitmodules
6+
7+
# Ignore temperory volumes
8+
deploy/compose/volumes
9+
10+
# creating a docker image
11+
.dockerignore
12+
13+
# Ignore any virtual environment configuration files
14+
.env*
15+
.venv/
16+
env/
17+
# Ignore python bytecode files
18+
*.pyc
19+
__pycache__/

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,7 @@ docs/_*
2424
docs/notebooks
2525
docs/experimental
2626
docs/tools
27+
28+
# Developing examples
29+
RetrievalAugmentedGeneration/examples/simple_rag_api_catalog/
30+
deploy/compose/simple-rag-api-catalog.yaml

.pre-commit-config.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,17 @@ repos:
99
args:
1010
- --license-filepath
1111
- RetrievalAugmentedGeneration/LICENSE.md
12+
- repo: https://github.com/psf/black
13+
rev: 19.10b0
14+
hooks:
15+
- id: black
16+
args: ["--skip-string-normalization", "--line-length=119"]
17+
additional_dependencies: ['click==8.0.4']
18+
- repo: https://github.com/pycqa/isort
19+
rev: 5.12.0
20+
hooks:
21+
- id: isort
22+
name: isort (python)
23+
args: ["--multi-line=3", "--trailing-comma", "--force-grid-wrap=0", "--use-parenthese", "--line-width=119", "--ws"]
24+
25+

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,38 @@ All notable changes to this project will be documented in this file.
33

44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
7+
## [0.7.0] - 2024-06-18
8+
9+
This release switches all examples to use cloud hosted GPU accelerated LLM and embedding models from [Nvidia API Catalog](https://build.nvidia.com) as default. It also deprecates support to deploy on-prem models using NeMo Inference Framework Container and adds support to deploy accelerated generative AI models across the cloud, data center, and workstation using [latest Nvidia NIM-LLM](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html).
10+
11+
### Added
12+
- Added model [auto download and caching support for `nemo-retriever-embedding-microservice` and `nemo-retriever-reranking-microservice`](./deploy/compose/docker-compose-nim-ms.yaml). Updated steps to deploy the services can be found [here](https://nvidia.github.io/GenerativeAIExamples/latest/nim-llms.html).
13+
- [Multimodal RAG Example enhancements](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)
14+
- Moved to the [PDF Plumber library](https://pypi.org/project/pdfplumber/) for parsing text and images.
15+
- Added `pgvector` vector DB support.
16+
- Added support to ingest files with .pptx extension
17+
- Improved accuracy of image parsing by using [tesseract-ocr](https://pypi.org/project/tesseract-ocr/)
18+
- Added a [new notebook showcasing RAG usecase using accelerated NIM based on-prem deployed models](./notebooks/08_RAG_Langchain_with_Local_NIM.ipynb)
19+
- Added a [new experimental example](./experimental/rag-developer-chatbot/) showcasing how to create a developer-focused RAG chatbot using RAPIDS cuDF source code and API documentation.
20+
- Added a [new experimental example](./experimental/event-driven-rag-cve-analysis/) demonstrating how NVIDIA Morpheus, NIMs, and RAG pipelines can be integrated to create LLM-based agent pipelines.
21+
22+
### Changed
23+
- All examples now use llama3 models from [Nvidia API Catalog](https://build.nvidia.com/search?term=llama3) as default. Summary of updated examples and the model it uses is available [here](https://nvidia.github.io/GenerativeAIExamples/latest/index.html#developer-rag-examples).
24+
- Switched default embedding model of all examples to [Snowflake arctic-embed-I model](https://build.nvidia.com/snowflake/arctic-embed-l)
25+
- Added more verbose logs and support to configure [log level for chain server using LOG_LEVEL enviroment variable](https://nvidia.github.io/GenerativeAIExamples/latest/configuration.html#chain-server).
26+
- Bumped up version of `langchain-nvidia-ai-endpoints`, `sentence-transformers` package and `milvus` containers
27+
- Updated base containers to use ubuntu 22.04 image `nvcr.io/nvidia/base/ubuntu:22.04_20240212`
28+
- Added `llama-index-readers-file` as dependency to avoid runtime package installation within chain server.
29+
30+
31+
### Deprecated
32+
- Deprecated support of on-prem LLM model deployment using [NeMo Inference Framework Container](https://github.com/NVIDIA/GenerativeAIExamples/blob/v0.6.0/deploy/compose/rag-app-text-chatbot.yaml#L2). Developers can use [Nvidia NIM-LLM to deploy TensorRT optimized models on-prem and plug them in with existing examples](https://nvidia.github.io/GenerativeAIExamples/latest/nim-llms.html).
33+
- Deprecated [kubernetes operator support](https://github.com/NVIDIA/GenerativeAIExamples/tree/v0.6.0/deploy/k8s-operator/kube-trailblazer).
34+
- `nvolveqa_40k` embedding model was deprecated from [Nvidia API Catalog](https://build.nvidia.com). Updated all [notebooks](./notebooks/) and [experimental artifacts](./experimental/) to use [Nvidia embed-qa-4 model](https://build.nvidia.com/nvidia/embed-qa-4) instead.
35+
- Removed [notebooks numbered 00-04](https://github.com/NVIDIA/GenerativeAIExamples/tree/v0.6.0/notebooks), which used on-prem LLM model deployment using deprecated [NeMo Inference Framework Container](https://github.com/NVIDIA/GenerativeAIExamples/blob/v0.6.0/deploy/compose/rag-app-text-chatbot.yaml#L2).
36+
37+
638
## [0.6.0] - 2024-05-07
739

840
### Added

README.md

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ State-of-the-art Generative AI examples that are easy to deploy, test, and exten
88

99
## NVIDIA NGC
1010

11-
Generative AI Examples can use models and GPUs from the [NVIDIA NGC: AI Development Catalog](https://catalog.ngc.nvidia.com).
11+
Generative AI Examples can use models and GPUs from the [NVIDIA API Catalog](https://catalog.ngc.nvidia.com).
1212

1313
Sign up for a [free NGC developer account](https://ngc.nvidia.com/signin) to access:
1414

@@ -27,34 +27,32 @@ The examples demonstrate how to combine NVIDIA GPU acceleration with popular LLM
2727
The examples are easy to deploy with [Docker Compose](https://docs.docker.com/compose/).
2828

2929
Examples support local and remote inference endpoints.
30-
If you have a GPU, you can inference locally with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM).
30+
If you have a GPU, you can inference locally with an [NVIDIA NIM for LLMs](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nim/containers/nim_llm).
3131
If you don't have a GPU, you can inference and embed remotely with [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).
3232

3333
| Model | Embedding | Framework | Description | Multi-GPU | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
3434
| ---------------------------------- | ---------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------- | ---------------- | ------ | ------------------ |
35-
| mixtral_8x7b | ai-embed-qa-4 | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
36-
| llama-2 | UAE-Large-V1 | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
37-
| llama-2 | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)] | No | Yes | No | No | FAISS |
38-
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
39-
| mixtral_8x7b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
40-
| mixtral_8x7b<br>Deplot<br>Neva-22b | ai-embed-qa-4 | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
41-
| llama-2 | UAE-Large-V1 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)] | Yes | Yes | No | Yes | Milvus or pgvector |
35+
| llama3-70b | snowflake-arctic-embed-l | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
36+
| llama3-8b | snowflake-arctic-embed-l | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html#using-the-llamaindex-data-framework)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
37+
| llama3-70b | snowflake-arctic-embed-l | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
38+
| llama3-70b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
39+
| llama3-8b<br>Deplot<br>Neva-22b | snowflake-arctic-embed-l | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
4240
| llama3-70b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)] | No | No | Yes | No | none |
43-
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |
41+
| llama3-8b | snowflake-arctic-embed-l | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |
4442

4543
### Enterprise RAG Examples
4644

4745
The enterprise RAG examples run as microservices distributed across multiple VMs and GPUs.
4846
These examples show how to orchestrate RAG pipelines with [Kubernetes](https://kubernetes.io/) and deployed with [Helm](https://helm.sh/).
4947

5048
Enterprise RAG examples include a [Kubernetes operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) for LLM lifecycle management.
51-
It is compatible with the [NVIDIA GPU operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
49+
It is compatible with the [NVIDIA GPU Operator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/gpu-operator) that automates GPU discovery and lifecycle management in a Kubernetes cluster.
5250

5351
Enterprise RAG examples also support local and remote inference with [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and [NVIDIA API Catalog endpoints](https://build.nvidia.com/explore/discover).
5452

5553
| Model | Embedding | Framework | Description | Multi-GPU | Multi-node | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
5654
| ------- | ----------- | ---------- | -------------------------------------------------------------------------- | --------- | ---------- | ------- | ---------------- | ------ | --------------- |
57-
| llama-2 | NV-Embed-QA | LlamaIndex | Chat bot, Kubernetes deployment [[README](./docs/developer-llm-operator/)] | No | No | Yes | No | Yes | Milvus |
55+
| llama-3 | nv-embed-qa-4 | LlamaIndex | Chat bot, Kubernetes deployment [[chart](https://registry.ngc.nvidia.com/orgs/ohlfw0olaadg/teams/ea-participants/helm-charts/rag-app-text-chatbot)] | No | No | Yes | No | Yes | Milvus |
5856

5957

6058
### Generative AI Model Examples
@@ -89,6 +87,16 @@ These are open source connectors for NVIDIA-hosted and self-hosted API endpoints
8987
|[NVIDIA Triton Inference Server](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|Triton inference server provides API access to hosted LLM models over gRPC. |
9088
|[NVIDIA TensorRT-LLM](https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_tensorrt.html) | [LlamaIndex](https://www.llamaindex.ai/) |Yes|Yes|No|TensorRT-LLM provides a Python API to build TensorRT engines with state-of-the-art optimizations for LLM inference on NVIDIA GPUs. |
9189

90+
91+
## Related NVIDIA RAG Projects
92+
93+
- [NVIDIA Tokkio LLM-RAG](https://docs.nvidia.com/ace/latest/workflows/tokkio/text/Tokkio_LLM_RAG_Bot.html): Use Tokkio to add avatar animation for RAG responses.
94+
95+
- [RAG on Windows using TensorRT-LLM and LlamaIndex](https://github.com/NVIDIA/ChatRTX): Create RAG chatbots on Windows using TensorRT-LLM.
96+
97+
- [Hybrid RAG Project on AI Workbench](https://github.com/NVIDIA/workbench-example-hybrid-rag): Run an NVIDIA AI Workbench example project for RAG.
98+
99+
92100
## Support, Feedback, and Contributing
93101

94102
We're posting these examples on GitHub to support the NVIDIA LLM community and facilitate feedback.

RetrievalAugmentedGeneration/Dockerfile

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
ARG BASE_IMAGE_URL=nvcr.io/nvidia/base/ubuntu
2-
ARG BASE_IMAGE_TAG=20.04_x64_2022-09-23
2+
ARG BASE_IMAGE_TAG=22.04_20240212
33

44
FROM ${BASE_IMAGE_URL}:${BASE_IMAGE_TAG}
55

@@ -11,7 +11,7 @@ RUN apt update && \
1111
apt install -y curl software-properties-common libgl1 libglib2.0-0 && \
1212
add-apt-repository ppa:deadsnakes/ppa && \
1313
apt update && apt install -y python3.10 python3.10-dev python3.10-distutils && \
14-
apt-get clean
14+
apt-get clean
1515

1616
# Install pip for python3.10
1717
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
@@ -24,20 +24,32 @@ RUN apt autoremove -y curl software-properties-common
2424
# Install common dependencies for all examples
2525
RUN --mount=type=bind,source=RetrievalAugmentedGeneration/requirements.txt,target=/opt/requirements.txt \
2626
pip3 install --no-cache-dir -r /opt/requirements.txt
27-
27+
2828
# Install any example specific dependency if available
2929
ARG EXAMPLE_NAME
3030
COPY RetrievalAugmentedGeneration/examples/${EXAMPLE_NAME} /opt/RetrievalAugmentedGeneration/example
3131
RUN if [ -f "/opt/RetrievalAugmentedGeneration/example/requirements.txt" ] ; then \
3232
pip3 install --no-cache-dir -r /opt/RetrievalAugmentedGeneration/example/requirements.txt ; else \
3333
echo "Skipping example dependency installation, since requirements.txt was not found" ; \
3434
fi
35+
RUN python3.10 -m nltk.downloader averaged_perceptron_tagger
3536

37+
RUN if [ "${EXAMPLE_NAME}" = "multimodal_rag" ] ; then \
38+
apt update && \
39+
apt install -y libreoffice && \
40+
apt install -y tesseract-ocr ; \
41+
fi
3642
# Copy required common modules for all examples
3743
COPY RetrievalAugmentedGeneration/__init__.py /opt/RetrievalAugmentedGeneration/
3844
COPY RetrievalAugmentedGeneration/common /opt/RetrievalAugmentedGeneration/common
3945
COPY integrations /opt/integrations
4046
COPY tools /opt/tools
4147

48+
RUN mkdir /tmp-data/; mkdir /tmp-data/nltk_data/
49+
RUN chmod 777 -R /tmp-data
50+
RUN chown 1000:1000 -R /tmp-data
51+
ENV NLTK_DATA=/tmp-data/nltk_data/
52+
ENV HF_HOME=/tmp-data
53+
4254
WORKDIR /opt
4355
ENTRYPOINT ["uvicorn", "RetrievalAugmentedGeneration.common.server:app"]

0 commit comments

Comments
 (0)