Skip to content

Commit e711143

Browse files
authored
Upstream changes for v0.6.0 release (#115)
1 parent 136da43 commit e711143

File tree

141 files changed

+8834
-1886
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

141 files changed

+8834
-1886
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,4 @@ uploaded_files/
2323
docs/_*
2424
docs/notebooks
2525
docs/experimental
26+
docs/tools

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,39 @@ All notable changes to this project will be documented in this file.
33

44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
## [0.6.0] - 2024-05-07
7+
8+
### Added
9+
- Ability to switch between [API Catalog](https://build.nvidia.com/explore/discover) models to on-prem models using [NIM-LLM](https://docs.nvidia.com/ai-enterprise/nim-llm/latest/index.html).
10+
- New API endpoint
11+
- `/health` - Provides a health check for the chain server.
12+
- Containerized [evaluation application](./tools/evaluation/) for RAG pipeline accuracy measurement.
13+
- Observability support for langchain based examples.
14+
- New Notebooks
15+
- Added [Chat with NVIDIA financial data](./notebooks/12_Chat_wtih_nvidia_financial_reports.ipynb) notebook.
16+
- Added notebook showcasing [langgraph agent handling](./notebooks/11_LangGraph_HandlingAgent_IntermediateSteps.ipynb).
17+
- A [simple rag example template](https://nvidia.github.io/GenerativeAIExamples/latest/simple-examples.html) showcasing how to build an example from scratch.
18+
19+
### Changed
20+
- Renamed example `csv_rag` to [structured_data_rag](./RetrievalAugmentedGeneration/examples/structured_data_rag/)
21+
- Model Engine name update
22+
- `nv-ai-foundation` and `nv-api-catalog` llm engine are renamed to `nvidia-ai-endpoints`
23+
- `nv-ai-foundation` embedding engine is renamed to `nvidia-ai-endpoints`
24+
- Embedding model update
25+
- `developer_rag` example uses [UAE-Large-V1](https://huggingface.co/WhereIsAI/UAE-Large-V1) embedding model.
26+
- Using `ai-embed-qa-4` for api catalog examples instead of `nvolveqa_40k` as embedding model
27+
- Ingested data now persists across multiple sessions.
28+
- Updated langchain-nvidia-endpoints to version 0.0.11, enabling support for models like llama3.
29+
- File extension based validation to throw error for unsupported files.
30+
- The default output token length in the UI has been increased from 250 to 1024 for more comprehensive responses.
31+
- Stricter chain-server API validation support to enhance API security
32+
- Updated version of llama-index, pymilvus.
33+
- Updated pgvector container to `pgvector/pgvector:pg16`
34+
- LLM Model Updates
35+
- [Multiturn Chatbot](./RetrievalAugmentedGeneration/examples/multi_turn_rag/) now uses `ai-mixtral-8x7b-instruct` model for response generation.
36+
- [Structured data rag](./RetrievalAugmentedGeneration/examples/structured_data_rag/) now uses `ai-llama3-70b` for response and code generation.
37+
38+
639
## [0.5.0] - 2024-03-19
740

841
This release adds new dedicated RAG examples showcasing state of the art usecases, switches to the latest [API catalog endpoints from NVIDIA](https://build.nvidia.com/explore/discover) and also refactors the API interface of chain-server. This release also improves the developer experience by adding github pages based documentation and streamlining the example deployment flow using dedicated compose files.

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,15 +32,15 @@ If you don't have a GPU, you can inference and embed remotely with [NVIDIA API C
3232

3333
| Model | Embedding | Framework | Description | Multi-GPU | TRT-LLM | NVIDIA Endpoints | Triton | Vector Database |
3434
| ---------------------------------- | ---------------- | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------- | ---------------- | ------ | ------------------ |
35-
| mixtral_8x7b | nvolveqa_40k | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
36-
| llama-2 | e5-large-v2 | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
35+
| mixtral_8x7b | ai-embed-qa-4 | LangChain | NVIDIA API Catalog endpoints chat bot [[code](./RetrievalAugmentedGeneration/examples/nvidia_api_catalog/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/api-catalog.html)] | No | No | Yes | Yes | Milvus or pgvector |
36+
| llama-2 | UAE-Large-V1 | LlamaIndex | Canonical QA Chatbot [[code](./RetrievalAugmentedGeneration/examples/developer_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/local-gpu.html)] | [Yes](https://nvidia.github.io/GenerativeAIExamples/latest/multi-gpu.html) | Yes | No | Yes | Milvus or pgvector |
3737
| llama-2 | all-MiniLM-L6-v2 | LlamaIndex | Chat bot, GeForce, Windows [[repo](https://github.com/NVIDIA/trt-llm-rag-windows/tree/release/1.0)] | No | Yes | No | No | FAISS |
38-
| llama-2 | nvolveqa_40k | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
39-
| mixtral_8x7b | nvolveqa_40k | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
40-
| mixtral_8x7b<br>Deplot<br>Neva-22b | nvolveqa_40k | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
41-
| llama-2 | e5-large-v2 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)] | Yes | Yes | No | Yes | Milvus or pgvector |
42-
| mixtral_8x7b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)] | No | No | Yes | No | none |
43-
| llama-2 | nvolveqa_40k | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |
38+
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with query decomposition agent [[code](./RetrievalAugmentedGeneration/examples/query_decomposition_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/query-decomposition.html)] | No | No | Yes | Yes | Milvus or pgvector |
39+
| mixtral_8x7b | ai-embed-qa-4 | LangChain | Minimilastic example: RAG with NVIDIA AI Foundation Models [[code](./examples/5_mins_rag_no_gpu/), [README](./examples/README.md#rag-in-5-minutes-example)] | No | No | Yes | Yes | FAISS |
40+
| mixtral_8x7b<br>Deplot<br>Neva-22b | ai-embed-qa-4 | Custom | Chat bot with multimodal data [[code](./RetrievalAugmentedGeneration/examples/multimodal_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multimodal-data.html)] | No | No | Yes | No | Milvus or pvgector |
41+
| llama-2 | UAE-Large-V1 | LlamaIndex | Chat bot with quantized LLM model [[docs](https://nvidia.github.io/GenerativeAIExamples/latest/quantized-llm-model.html)] | Yes | Yes | No | Yes | Milvus or pgvector |
42+
| llama3-70b | none | PandasAI | Chat bot with structured data [[code](./RetrievalAugmentedGeneration/examples/structured_data_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/structured-data.html)] | No | No | Yes | No | none |
43+
| llama-2 | ai-embed-qa-4 | LangChain | Chat bot with multi-turn conversation [[code](./RetrievalAugmentedGeneration/examples/multi_turn_rag/), [docs](https://nvidia.github.io/GenerativeAIExamples/latest/multi-turn.html)] | No | No | Yes | No | Milvus or pgvector |
4444

4545
### Enterprise RAG Examples
4646

RetrievalAugmentedGeneration/Dockerfile

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ENV DEBIAN_FRONTEND noninteractive
88

99
# Install required ubuntu packages for setting up python 3.10
1010
RUN apt update && \
11-
apt install -y dpkg openssl libgl1 linux-libc-dev libksba8 curl software-properties-common build-essential libssl-dev libffi-dev && \
11+
apt install -y curl software-properties-common libgl1 libglib2.0-0 && \
1212
add-apt-repository ppa:deadsnakes/ppa && \
1313
apt update && apt install -y python3.10 python3.10-dev python3.10-distutils && \
1414
apt-get clean
@@ -18,6 +18,9 @@ RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.10
1818

1919
RUN rm -rf /var/lib/apt/lists/*
2020

21+
# Uninstall build packages
22+
RUN apt autoremove -y curl software-properties-common
23+
2124
# Install common dependencies for all examples
2225
RUN --mount=type=bind,source=RetrievalAugmentedGeneration/requirements.txt,target=/opt/requirements.txt \
2326
pip3 install --no-cache-dir -r /opt/requirements.txt

RetrievalAugmentedGeneration/common/configuration.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ class LLMConfig(ConfigWizard):
5757

5858
server_url: str = configfield(
5959
"server_url",
60-
default="localhost:8001",
60+
default="",
6161
help_txt="The location of the Triton server hosting the llm model.",
6262
)
6363
model_name: str = configfield(
@@ -86,7 +86,7 @@ class TextSplitterConfig(ConfigWizard):
8686

8787
model_name: str = configfield(
8888
"model_name",
89-
default="intfloat/e5-large-v2",
89+
default="WhereIsAI/UAE-Large-V1",
9090
help_txt="The name of Sentence Transformer model used for SentenceTransformer TextSplitter.",
9191
)
9292
chunk_size: int = configfield(
@@ -110,7 +110,7 @@ class EmbeddingConfig(ConfigWizard):
110110

111111
model_name: str = configfield(
112112
"model_name",
113-
default="intfloat/e5-large-v2",
113+
default="WhereIsAI/UAE-Large-V1",
114114
help_txt="The name of huggingface embedding model.",
115115
)
116116
model_engine: str = configfield(
@@ -125,7 +125,7 @@ class EmbeddingConfig(ConfigWizard):
125125
)
126126
server_url: str = configfield(
127127
"server_url",
128-
default="localhost:9080",
128+
default="",
129129
help_txt="The url of the server hosting nemo embedding model",
130130
)
131131

0 commit comments

Comments
 (0)