All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- #453 Update documentation for NVIDIA API Catalog example.
- #382 Fix issue with
lowest_temperature
in self-check and hallucination rails. - #454 Redo fix for #385.
- #442 Fix README type by @dileepbapat.
- #402 Integrate Vertex AI Models into Guardrails by @aishwaryap.
- #403 Add support for NVIDIA AI Endpoints by @patriciapampanelli
- #396 Docs/examples nv ai foundation models.
- #438 Add research roadmap documentation.
- #389 Expose the
verbose
parameter throughRunnableRails
by @d-mariano. - #415 Enable
print(...)
andlog(...)
. - #389 Expose verbose arg in RunnableRails by @d-mariano.
- #414 Feature/colang march release.
- #416 Refactor and improve the verbose/debug mode.
- #418 Feature/colang flow context sharing.
- #425 Feature/colang meta decorator.
- #427 Feature/colang single flow activation.
- #426 Feature/colang 2.0 tutorial.
- #428 Feature/Standard library and examples.
- #431 Feature/colang various improvements.
- #433 Feature/Colang 2.0 improvements: generate_async support, stateful API.
- #412 Fix #411 - explain rails not working for chat models.
- #413 Typo fix: Comment in llm_flows.co by @habanoz.
- #420 Fix typo for hallucination message.
- #377 Add example for streaming from custom action.
- #380 Update installation guide for OpenAI usage.
- #401 Replace YAML import with new import statement in multi-modal example.
- #398 Colang parser fixes and improvements.
- #394 Fixes and improvements for Colang 2.0 runtime.
- #381 Fix typo by @serhatgktp.
- #379 Fix missing prompt in verbose mode for chat models.
- #400 Fix Authorization header showing up in logs for NeMo LLM.
- #292 Jailbreak heuristics by @erickgalinkin.
- #256 Support generation options.
- #307 Added support for multi-config api calls by @makeshn.
- #293 Adds configurable stop tokens by @zmackie.
- #334 Colang 2.0 - Preview by @schuellc.
- #208 Implement cache embeddings (resolves #200) by @Pouyanpi.
- #331 Huggingface pipeline streaming by @trebedea.
Documentation:
- #311 Update documentation to demonstrate the use of output rails when using a custom RAG by @niels-garve.
- #347 Add detailed logging docs by @erickgalinkin.
- #354 Input and output rails only guide by @trebedea.
- #359 Added user guide for jailbreak detection heuristics by @makeshn.
- #363 Add multi-config API call user guide.
- #297 Example configurations for using only the guardrails, without LLM generation.
- #309 Change the paper citation from ArXiV to EMNLP 2023 by @manuelciosici
- #319 Enable embeddings model caching.
- #267 Make embeddings computing async and add support for batching.
- #281 Follow symlinks when building knowledge base by @piotrm0.
- #280 Add more information to results of
retrieve_relevant_chunks
by @piotrm0. - #332 Update docs for batch embedding computations.
- #244 Docs/edit getting started by @DougAtNvidia.
- #333 Follow-up to PR 244.
- #341 Updated 'fastembed' version to 0.2.2 by @NirantK.
- #286 Fixed #285 - using the same evaluation set given a random seed for topical rails by @trebedea.
- #336 Fix #320. Reuse the asyncio loop between sync calls.
- #337 Fix stats gathering in a parallel async setup.
- #342 Fixes OpenAI embeddings support.
- #346 Fix issues with KB embeddings cache, bot intent detection and config ids validator logic.
- #349 Fix multi-config bug, asyncio loop issue and cache folder for embeddings.
- #350 Fix the incorrect logging of an extra dialog rail.
- #358 Fix Openai embeddings async support.
- #362 Fix the issue with the server being pointed to a folder with a single config.
- #352 Fix a few issues related to jailbreak detection heuristics.
- #356 Redo followlinks PR in new code by @piotrm0.
- #288 Replace SentenceTransformers with FastEmbed.
- #254 Support for Llama Guard input and output content moderation.
- #253 Support for server-side threads.
- #235 Improved LangChain integration through
RunnableRails
. - #190 Add example for using
generate_events_async
with streaming. - Support for Python 3.11.
- #286 Fixed not having the same evaluation set given a random seed for topical rails.
- #239 Fixed logging issue where
verbose=true
flag did not trigger expected log output. - #228 Fix docstrings for various functions.
- #242 Fix Azure LLM support.
- #225 Fix annoy import, to allow using without.
- #209 Fix user messages missing from prompt.
- #261 Fix small bug in
print_llm_calls_summary
. - #252 Fixed duplicate loading for the default config.
- Fixed the dependencies pinning, allowing a wider range of dependencies versions.
- Fixed sever security issues related to uncontrolled data used in path expression and information exposure through an exception.
- Support for
--version
flag in the CLI.
- Upgraded
langchain
to0.0.352
. - Upgraded
httpx
to0.24.1
. - Replaced deprecated
text-davinci-003
model withgpt-3.5-turbo-instruct
.
- #191: Fix chat generation chunk issue.
- Support for explicit definition of input/output/retrieval rails.
- Support for custom tasks and their prompts.
- Support for fact-checking using AlignScore.
- Support for NeMo LLM Service as an LLM provider.
- Support for making a single LLM call for both the guardrails process and generating the response (by setting
rails.dialog.single_call.enabled
toTrue
). - Support for sensitive data detection guardrails using Presidio.
- Example using NeMo Guardrails with the LLaMa2-13B model.
- Dockerfile for building a Docker image.
- Support for prompting modes using
prompting_mode
. - Support for TRT-LLM as an LLM provider.
- Support for streaming the LLM responses when no output rails are used.
- Integration of ActiveFence ActiveScore API as an input rail.
- Support for
--prefix
and--auto-reload
in the guardrails server. - Example authentication dialog flow.
- Example RAG using Pinecone.
- Support for loading a configuration from dictionary, i.e.
RailsConfig.from_content(config=...)
. - Guidance on LLM support.
- Support for
LLMRails.explain()
(see the Getting Started guide for sample usage).
- Allow context data directly in the
/v1/chat/completion
using messages with the type"role"
. - Allow calling a subflow whose name is in a variable, e.g.
do $some_name
. - Allow using actions which are not
async
functions. - Disabled pretty exceptions in CLI.
- Upgraded dependencies.
- Updated the Getting Started Guide.
- Main README now provides more details.
- Merged original examples into a single ABC Bot and removed the original ones.
- Documentation improvements.
- Fix going over the maximum prompt length using the
max_length
attribute in Prompt Templates. - Fixed problem with
nest_asyncio
initialization. - #144 Fixed TypeError in logging call.
- #121 Detect chat model using openai engine.
- #109 Fixed minor logging issue.
- Parallel flow support.
- Fix
HuggingFacePipeline
bug related to LangChain version upgrade.
- Support for custom configuration data.
- Example for using custom LLM and multiple KBs
- Support for
PROMPTS_DIR
. - #101 Support for using OpenAI embeddings models in addition to SentenceTransformers.
- First set of end-to-end QA tests for the example configurations.
- Support for configurable embedding search providers
- Moved to using
nest_asyncio
for implementing the blocking API. Fixes #3 and #32. - Improved event property validation in
new_event_dict
. - Refactored imports to allow installing from source without Annoy/SentenceTransformers (would need a custom embedding search provider to work).
- Fixed when the
init
function fromconfig.py
is called to allow custom LLM providers to be registered inside. - #93: Removed redundant
hasattr
check innemoguardrails/llm/params.py
. - #91: Fixed how default context variables are initialized.
- Event-based API for guardrails.
- Support for message with type "event" in
LLMRails.generate_async
. - Support for bot message instructions.
- Support for using variables inside bot message definitions.
- Support for
vicuna-7b-v1.3
andmpt-7b-instruct
. - Topical evaluation results for
vicuna-7b-v1.3
andmpt-7b-instruct
. - Support to use different models for different LLM tasks.
- Support for red-teaming using challenges.
- Support to disable the Chat UI when running the server using
--disable-chat-ui
. - Support for accessing the API request headers in server mode.
- Support to enable CORS settings for the guardrails server.
- Changed the naming of the internal events to align to the upcoming UMIM spec (Unified Multimodal Interaction Management).
- If there are no user message examples, the bot messages examples lookup is disabled as well.
- #58: Fix install on Mac OS 13.
- #55: Fix bug in example causing config.py to crash on computers with no CUDA-enabled GPUs.
- Fixed the model name initialization for LLMs that use the
model
kwarg. - Fixed the Cohere prompt templates.
- #55: Fix bug related to LangChain callbacks initialization.
- Fixed generation of "..." on value generation.
- Fixed the parameters type conversion when invoking actions from colang (previously everything was string).
- Fixed
model_kwargs
property for theWrapperLLM
. - Fixed bug when
stop
was used inside flows. - Fixed Chat UI bug when an invalid guardrails configuration was used.
- Support for defining subflows.
- Improved support for customizing LLM prompts
- Support for using filters to change how variables are included in a prompt template.
- Output parsers for prompt templates.
- The
verbose_v1
formatter and output parser to be used for smaller models that don't understand Colang very well in a few-shot manner. - Support for including context variables in prompt templates.
- Support for chat models i.e. prompting with a sequence of messages.
- Experimental support for allowing the LLM to generate multi-step flows.
- Example of using Llama Index from a guardrails configuration (#40).
- Example for using HuggingFace Endpoint LLMs with a guardrails configuration.
- Example for using HuggingFace Pipeline LLMs with a guardrails configuration.
- Support to alter LLM parameters passed as
model_kwargs
in LangChain. - CLI tool for running evaluations on the different steps (e.g., canonical form generation, next steps, bot message) and on existing rails implementation (e.g., moderation, jailbreak, fact-checking, and hallucination).
- Initial evaluation results for
text-davinci-003
andgpt-3.5-turbo
. - The
lowest_temperature
can be set through the guardrails config (to be used for deterministic tasks).
- The core templates now use Jinja2 as the rendering engines.
- Improved the internal prompting architecture, now using an LLM Task Manager.
- Fixed bug related to invoking a chain with multiple output keys.
- Fixed bug related to tracking the output stats.
- #51: Bug fix - avoid str concat with None when logging user_intent.
- #54: Fix UTF-8 encoding issue and add embedding model configuration.
- Support to connect any LLM that implements the BaseLanguageModel interface from LangChain.
- Support for customizing the prompts for specific LLM models.
- Support for custom initialization when loading a configuration through
config.py
. - Support to extract user-provided values from utterances.
- Improved the logging output for Chat CLI (clear events stream, prompts, completion, timing information).
- Updated system actions to use temperature 0 where it makes sense, e.g., canonical form generation, next step generation, fact checking, etc.
- Excluded the default system flows from the "next step generation" prompt.
- Updated langchain to 0.0.167.
- Fixed initialization of LangChain tools.
- Fixed the overriding of general instructions #7.
- Fixed action parameters inspection bug #2.
- Fixed bug related to multi-turn flows #13.
- Fixed Wolfram Alpha error reporting in the sample execution rail.
- First alpha release.