Skip to content

Commit

Permalink
chores: update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
RobinQu committed Jun 30, 2024
1 parent 0ee2ce3 commit 5616f0e
Show file tree
Hide file tree
Showing 4 changed files with 109 additions and 71 deletions.
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
# Changelog

## v0.1.5

**Full Changelog**: https://github.com/RobinQu/instinct.cpp/commits/v0.1.5

* Features
* `instinct-transformer`: New bge-m3 embedding model. Generally speaking, bge-reranker and bge-embedding are still in preview as they are not fast enough for production.
* `instinct-llm`: New `JinaRerankerModel` for Reranker model API from Jina.ai.
* `instinct-retrieval`: New `DuckDBBM25Retriever` for BM25 keyword based retriever using DuckDb's built-in function.
* Improvements
* Rename for all files for camel-case naming conventions
* Build system:
* Fix include paths for internal header files. Now all files are referenced using angle bracket pattern like `#include <instinct/...>`.
* Rewrite Cmake install rules.
* Run unit tests during `conan build` using `Ctest`.
* `doc-agent`:
* Use `retriver-version` argument in CLI to control how retriever related components are constructed.
* Rewrite lifecycle control using application context
* `instinct-retrieval`: Fix RAG evaluation. RAG pipeline with MultiPathRetriever should get score more than 80%.

## v0.1.4

Expand Down
12 changes: 11 additions & 1 deletion docs/testing.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
# Testing

In order to make all components working, some env variables are needed to correctly configured.

## Unit testing
```shell
export OPENAI_CHAT_API_ENDPOINT=<API ENDPOINT FOR CHAT MODEL>
export OPENAI_EMBEDDING_API_ENDPOINT=<API ENDPOINT FOR EMBEDDING MODEL>
export OPENAI_API_KEY=<YOUR OPENAI API KEY>
export OPENAI_CHAT_MODEL=<CHAT MODEL NAME>
export OPENAI_EMBEDDING_MODEL=<EMBEDDING MODEL NAME>
export OPENAI_EMBEDDING_DIM=<EMBEDDING MODEL DIMENSION>
export SERP_APIKEY=<SERP API KEY>
export JINA_API_KEY=<JINA API KEY>
```

Unit tests are registered to ctest, so you can trigger test runner by running following command in `build` folder:

Expand Down
96 changes: 48 additions & 48 deletions modules/instinct-apps/doc-agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,64 +5,57 @@ A tiny yet powerful agent as CLI application. Help text from `--help-all` is sel
```shell
% doc-agent --help-all
🤖 DocAgent: Chat with your documents locally with privacy.
Usage: doc-agent [OPTIONS] SUBCOMMAND
Usage: /Users/robinqu/Workspace/github/robinqu/instinct.cpp/build/Debug/modules/instinct-apps/doc-agent/doc-agent [OPTIONS] SUBCOMMAND

Options:
-h,--help Print this help message and exit
--help-all Expand all help
--db_path TEXT REQUIRED DB file path for botch vetcor store and doc store.
--db_path TEXT REQUIRED DB file path for botch vector store and doc store.
-v,--verbose A flag to enable verbose log
[Option Group: 🧠 Provider for embedding model]
Ollama, OpenAI API, or any OpenAI API compatible servers are supported. Defaults to a local running Ollama service using llama2:latest model.
[Option Group: 🧠 Provider for chat model]
Model provider for chat model
Options:
--chat_model_provider TEXT:{ollama,openai} [ollama]
--chat_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
Specify chat model to use for chat completion.
--chat_model_api_key TEXT API key for comercial services like OpenAI. Leave blank for services without ACL.
--chat_model_host TEXT [localhost]
Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
--chat_model_port INT [11434]
Port number for API service.
--chat_model_protocol ENUM:value in {http->1,https->2} OR {1,2} [1]
HTTP protocol for API service.
--chat_model_model_name TEXT [llama2:latest]
--chat_model_api_key TEXT API key for commercial services like OpenAI. Leave blank for services without ACL.
--chat_model_endpoint TEXT Endpoint for chat model API.
--chat_model_model_name TEXT
Specify name of the model to be used.
[Option Group: 🧠 Provider for chat model]
Ollama, OpenAI API, or any OpenAI API compatible servers are supported. Defaults to a local running Ollama service using llama2:latest model.
[Option Group: 🧠 Provider for embedding model]
Ollama, OpenAI API, or any OpenAI API compatible servers are supported.
Options:
--embedding_model_provider TEXT:{ollama,openai} [ollama]
--embedding_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
Specify embedding model to use.
--embedding_model_api_key TEXT
API key for comercial services like OpenAI. Leave blank for services without ACL.
--embedding_model_host TEXT [localhost]
Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
--embedding_model_port INT [11434]
Port number for API service.
--embedding_model_protocol ENUM:value in {http->1,https->2} OR {1,2} [1]
HTTP protocol for API service.
--embedding_model_model_name TEXT [llama2:latest]
API key for commercial services like OpenAI. Leave blank for services without ACL.
--embedding_model_endpoint TEXT
Endpoint for text embedding model, .e.g. 'https://api.openai.com/v1/api/embeddings' for OpenAI.
--embedding_model_model_name TEXT
Specify name of the model to be used.
[Option Group: 🧠 Provider for reranker model]
Currently only Jina.ai and local model are supported.
Options:
--reranker_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
Specify reranker model provider to use.
--reranker_model_api_key TEXT
API key for commercial services like Jina.ai. Leave blank for services without ACL.
--reranker_model_endpoint TEXT
Endpoint for reranker model API.
--reranker_model_model_name TEXT
Specify name of the model to be used.
[Option Group: 🔍 Retriever]
Options for building retriever
[Option Group: base_retriever]
A must-to-have base retriever that handles original documents.
[Exactly 1 of the following options is required]
Options:
--plain_vector_retriever Enable VectorStoreRetiever.
--parent_child_retriever Enable ChunkedMultiVectorRetriever.
--summary_guided_retriever Enable MultiVectorGuidance with summary guidance.
--hypothetical_quries_guided_retriever
Enable MultiVectorGuidance with hypothetical queries.
Options:
--retriever_version INT [2]
Version 1: Use ChunkedMultiVectorRetriever only.
Version 2: Use ChunkedMultiVectorRetriever, and BM25 keyword-based Retriever together with a local reranker.

[Option Group: Options for ChunkedMultiVectorRetriever]
Options:
--child_chunk_size INT:INT in [200 - 10000] [200]
chunk size for child document
--parent_chunk_size INT:INT in [0 - 1000000] [0]
chunk size for parent document. Zero means disabling parent document splitter.
[Option Group: query_rewriter]
Adaptor retrievers that rewrites original query.
[At most 1 of the following options are allowed]
Options:
--multi_query_retriever Enable MultiQueryRetriever.
[Option Group: 🔢 VectorStore]
Options:
--vector_table_dimension UINT:INT bounded to [1 - 8192] REQUIRED
Expand All @@ -76,7 +69,7 @@ Options:

Subcommands:
build
💼 Anaylize a single document and build database of learned context data. Proper values should be offered for Embedding model, Chat model, DocStore, VecStore and Retriever mentioned above.
💼 Analyze a single document and build database of learned context data. Proper values should be offered for Embedding model, Chat model, DocStore, VecStore and Retriever mentioned above.
Options:
--force A flag to force rebuild of database, which means existing db files will be deleted. Use this option with caution!
[Option Group: Data source]
Expand All @@ -85,27 +78,34 @@ build
-t,--type TEXT:{PDF,DOCX,MD,TXT,PARQUET} [TXT]
File format of assigned document. Supported types are PDF,TXT,MD,DOCX,PARQUET
--parquet_mapping TEXT Mapping format for parquet columns. e.g. 1:t,2:m:parent_doc_id:int64,3:m:source:varchar.
--source_limit UINT [0] Limit max entries from data source. It's supported only part of ingestors including PARQUET. Zero means no limit.
serve
💃 Start a OpenAI API compatible server with database of learned context. Proper values should be offered for Chat model, DocStore, VecStore and Retriever mentioned above.
Options:
-p,--port INT [9090] Port number which API server will listen
```
Following line is example command to embed a local PDF and start up OpenAI-like chat completion API server at `localhost:9090`.
Following line is example command to embed a local PDF and start up server with local Ollama service as model provider.
```shell
doc-agent --verbose \
--parent_child_retriever \
--chat_model_model_name=starling-lm:latest \
--embedding_model_model_name=all-minilm:latest \
--db_path=/tmp/doc_agent.db \
--vector_table_dimension=384 \
build \
--retriever_version=1 \
--child_chunk_size=200 \
--chat_model_provider=ollama \
--chat_model_model_name=mistral:latest \
--chat_model_endpoint=http://192.168.0.132/api/embeddings \
--embedding_model_provider=ollama \
--embedding_model_model_name=all-minilm:latest \
--embedding_model_endpoint=http://192.168.0.132/api/chat \
--db_path=/tmp/rag_eval_v1.db \
--vector_table_dimension=384 \
build \
--force \
--file=attention_is_all_you_need.pdf \
--type=PDF \
serve \
--type=PARQUET \
--parquet_mapping=0:txt,1:metadata:source:varchar \
serve \
--port=9090
```
Expand Down
54 changes: 32 additions & 22 deletions modules/instinct-apps/mini-assistant/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,8 @@ assistant = client.beta.assistants.create(
## CLI Usage

```text
mini-assistant --help-all
🐬 mini-assistant - Local Assistant API at your service
Usage: /IdeaProjects/instinct.cpp/build/modules/instinct-examples/mini-assistant/mini-assistant [OPTIONS]
Usage: /Users/robinqu/Workspace/github/robinqu/instinct.cpp/build/Debug/modules/instinct-apps/mini-assistant/mini-assistant [OPTIONS]
Options:
-h,--help Print this help message and exit
Expand All @@ -67,48 +66,59 @@ Options:
Path for DuckDB database file.
--file_store_path TEXT REQUIRED
Path for root directory of local object store. Will be created if it doesn't exist yet.
--agent_executor_type TEXT:{llm_compiler,openai_tool} [llm_compiler]
--agent_executor_type TEXT:{llm_compiler,openai_tool} [llm_compiler]
Specify agent executor type. `llm_compiler` enables parallel function calling with opensourced models like mistral series and llama series, while `openai_tool` relies on official OpenAI function calling capability to direct agent workflow.
-v,--verbose A flag to enable verbose log
[Option Group: chat_model]
Configuration for chat model
Options:
--chat_model_provider ENUM:value in {llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {4,3,2,1,0} [0]
--chat_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0} [0]
Specify chat model to use for chat completion.
--chat_model_name TEXT Specify chat model to use for chat completion. Default to gpt-3.5-turbo for OpenAI, llama3:8b for Ollama. Note that some model providers will ignore the passed model name and use the model currently loaded instead.
--chat_model_api_key TEXT API key for commercial services like OpenAI. Leave blank for services without ACL. API key is also retrieved from env variable named OPENAI_API_KEY.
--chat_model_host TEXT Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
--chat_model_port INT Port number for API service.
--chat_model_protocol ENUM:value in {http->1,https->2} OR {1,2}
HTTP protocol for API service.
--chat_model_endpoint TEXT Endpoint for chat model API, .e.g. 'https://api.openai.com/v1/chat_completion' for OpenAI.
[Option Group: embedding_model]
Configuration for embedding model
Options:
--embedding_model_provider ENUM:value in {llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {4,3,2,1,0} [0]
--embedding_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0} [0]
Specify model to use for embedding.
--embedding_model_name TEXT Specify model to use for embedding . Default to text-embedding-3-small for OpenAI, all-minilm:latest for Ollama. Note that some model providers will ignore the passed model name and use the model currently loaded instead.
--embedding_model_dim INT:POSITIVE
Dimension of given embedding model.
--embedding_model_api_key TEXT
API key for commercial services like OpenAI. Leave blank for services without ACL. API key is also retrieved from env variable named OPENAI_API_KEY.
--embedding_model_host TEXT Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
--embedding_model_port INT Port number for API service.
--embedding_model_protocol ENUM:value in {http->1,https->2} OR {1,2}
HTTP protocol for API service.
--embedding_model_endpoint TEXT
Endpoint for text embedding model API.
[Option Group: ranking_model]
Currently only Jina.ai and local model are supported.
Options:
--reranker_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
Specify reranker model provider to use.
--reranker_model_api_key TEXT
API key for commercial services like Jina.ai. Leave blank for services without ACL.
--reranker_model_endpoint TEXT
Endpoint for reranker model API.
--reranker_model_model_name TEXT
Specify name of the model to be used.
[Option Group: Options for LLMCompilerAgentExecutor]
Options for LLMCompiler-based agent executor
Options:
--max_replan INT [6] Max count for replan
```


## Implementation details
To run assistant api service with chat model served by `llama.cpp` server and text embedding model served by `Ollama`:

```
mini-assistant \
--db_file_path /tmp/assistant_api.db \
--file_store_path /tmp/mini-assistant-files \
--agent_executor_type=llm_compiler \
--chat_model_provider=llama_cpp \
--chat_model_endpoint=http://192.168.0.134:8000/v1/chat/completions \
--embedding_model_provider=ollama \
--embedding_model_endpoint=http://192.168.0.134:31434/v1/embeddings \
--embedding_model_name=all-minilm:latest \
--verbose
```

* A thread pool based task scheduler is used to handle jobs for `run` objects.
* DuckDB is used for convention structured data as well as vector data. Many improvements can be done. [More details](https://github.com/users/RobinQu/projects/1/views/1?pane=issue&itemId=62004973).
* More technical details about Assistant API can be found in [docs/assistant_api.md](../../../docs/assistant_api.md).
* known issues:
* Function calling requires OpenAI's `gpt-3.5` or `gpt-4` series. Function calling with opensourced LLMs is possible, and it's on top of my TODO list.
* All timestamps are currently printed in microsecond precision while it's printed in epoch seconds in official APIs.
* Only function tool is supported. `file-search` is next to come. `code-interpreter` is scheduled as later.

0 comments on commit 5616f0e

Please sign in to comment.