chores: update docs

RobinQu · Jun 30, 2024 · 5616f0e · 5616f0e
1 parent 0ee2ce3
commit 5616f0e
Show file tree

Hide file tree

Showing 4 changed files with 109 additions and 71 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,23 @@
 # Changelog
 
+## v0.1.5
+
+**Full Changelog**: https://github.com/RobinQu/instinct.cpp/commits/v0.1.5
+
+* Features
+  * `instinct-transformer`: New bge-m3 embedding model. Generally speaking, bge-reranker and bge-embedding are still in preview as they are not fast enough for production.
+  * `instinct-llm`: New `JinaRerankerModel` for Reranker model API from Jina.ai.
+  * `instinct-retrieval`: New `DuckDBBM25Retriever` for BM25 keyword based retriever using DuckDb's built-in function. 
+* Improvements
+  * Rename for all files for camel-case naming conventions 
+  * Build system:
+    * Fix include paths for internal header files. Now all files are referenced using angle bracket pattern like `#include <instinct/...>`. 
+    * Rewrite Cmake install rules. 
+    * Run unit tests during `conan build` using `Ctest`.
+  * `doc-agent`:
+    * Use `retriver-version` argument in CLI to control how retriever related components are constructed.
+    * Rewrite lifecycle control using application context
+  * `instinct-retrieval`: Fix RAG evaluation. RAG pipeline with MultiPathRetriever should get score more than 80%.
 
 ## v0.1.4
 

diff --git a/docs/testing.md b/docs/testing.md
@@ -1,7 +1,17 @@
 # Testing 
 
+In order to make all components working, some env variables are needed to correctly configured.
 
-## Unit testing
+```shell
+export OPENAI_CHAT_API_ENDPOINT=<API ENDPOINT FOR CHAT MODEL>
+export OPENAI_EMBEDDING_API_ENDPOINT=<API ENDPOINT FOR EMBEDDING MODEL>
+export OPENAI_API_KEY=<YOUR OPENAI API KEY>
+export OPENAI_CHAT_MODEL=<CHAT MODEL NAME>
+export OPENAI_EMBEDDING_MODEL=<EMBEDDING MODEL NAME>
+export OPENAI_EMBEDDING_DIM=<EMBEDDING MODEL DIMENSION>
+export SERP_APIKEY=<SERP API KEY>
+export JINA_API_KEY=<JINA API KEY>
+```
 
 Unit tests are registered to ctest, so you can trigger test runner by running following command in `build` folder:
 

diff --git a/modules/instinct-apps/doc-agent/README.md b/modules/instinct-apps/doc-agent/README.md
@@ -5,64 +5,57 @@ A tiny yet powerful agent as CLI application. Help text from `--help-all` is sel
 ```shell
 % doc-agent --help-all
 🤖 DocAgent: Chat with your documents locally with privacy.
-Usage: doc-agent [OPTIONS] SUBCOMMAND
+Usage: /Users/robinqu/Workspace/github/robinqu/instinct.cpp/build/Debug/modules/instinct-apps/doc-agent/doc-agent [OPTIONS] SUBCOMMAND
 
 Options:
   -h,--help                   Print this help message and exit
   --help-all                  Expand all help
-  --db_path TEXT REQUIRED     DB file path for botch vetcor store and doc store.
+  --db_path TEXT REQUIRED     DB file path for botch vector store and doc store.
   -v,--verbose                A flag to enable verbose log
-[Option Group: 🧠 Provider for embedding model]
-  Ollama, OpenAI API, or any OpenAI API compatible servers are supported. Defaults to a local running Ollama service using llama2:latest model.
+[Option Group: 🧠 Provider for chat model]
+  Model provider for chat model
   Options:
-    --chat_model_provider TEXT:{ollama,openai} [ollama]
+    --chat_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
                                 Specify chat model to use for chat completion.
-    --chat_model_api_key TEXT   API key for comercial services like OpenAI. Leave blank for services without ACL.
-    --chat_model_host TEXT [localhost]
-                                Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
-    --chat_model_port INT [11434]
-                                Port number for API service.
-    --chat_model_protocol ENUM:value in {http->1,https->2} OR {1,2} [1]
-                                HTTP protocol for API service.
-    --chat_model_model_name TEXT [llama2:latest]
+    --chat_model_api_key TEXT   API key for commercial services like OpenAI. Leave blank for services without ACL.
+    --chat_model_endpoint TEXT  Endpoint for chat model API.
+    --chat_model_model_name TEXT
                                 Specify name of the model to be used.
-[Option Group: 🧠 Provider for chat model]
-  Ollama, OpenAI API, or any OpenAI API compatible servers are supported. Defaults to a local running Ollama service using llama2:latest model.
+[Option Group: 🧠 Provider for embedding model]
+  Ollama, OpenAI API, or any OpenAI API compatible servers are supported.
   Options:
-    --embedding_model_provider TEXT:{ollama,openai} [ollama]
+    --embedding_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
                                 Specify embedding model to use.
     --embedding_model_api_key TEXT
-                                API key for comercial services like OpenAI. Leave blank for services without ACL.
-    --embedding_model_host TEXT [localhost]
-                                Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
-    --embedding_model_port INT [11434]
-                                Port number for API service.
-    --embedding_model_protocol ENUM:value in {http->1,https->2} OR {1,2} [1]
-                                HTTP protocol for API service.
-    --embedding_model_model_name TEXT [llama2:latest]
+                                API key for commercial services like OpenAI. Leave blank for services without ACL.
+    --embedding_model_endpoint TEXT
+                                Endpoint for text embedding model, .e.g. 'https://api.openai.com/v1/api/embeddings' for OpenAI.
+    --embedding_model_model_name TEXT
+                                Specify name of the model to be used.
+[Option Group: 🧠 Provider for reranker model]
+  Currently only Jina.ai and local model are supported.
+  Options:
+    --reranker_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
+                                Specify reranker model provider to use.
+    --reranker_model_api_key TEXT
+                                API key for commercial services like Jina.ai. Leave blank for services without ACL.
+    --reranker_model_endpoint TEXT
+                                Endpoint for reranker model API.
+    --reranker_model_model_name TEXT
                                 Specify name of the model to be used.
 [Option Group: 🔍 Retriever]
   Options for building retriever
-  [Option Group: base_retriever]
-    A must-to-have base retriever that handles original documents.
-    [Exactly 1 of the following options is required]
-    Options:
-      --plain_vector_retriever    Enable VectorStoreRetiever.
-      --parent_child_retriever    Enable ChunkedMultiVectorRetriever.
-      --summary_guided_retriever  Enable MultiVectorGuidance with summary guidance.
-      --hypothetical_quries_guided_retriever
-                                  Enable MultiVectorGuidance with hypothetical queries.
+  Options:
+    --retriever_version INT [2]
+                                Version 1: Use ChunkedMultiVectorRetriever only.
+                                Version 2: Use ChunkedMultiVectorRetriever, and BM25 keyword-based Retriever together with a local reranker.
+
   [Option Group: Options for ChunkedMultiVectorRetriever]
     Options:
       --child_chunk_size INT:INT in [200 - 10000] [200]
                                   chunk size for child document
       --parent_chunk_size INT:INT in [0 - 1000000] [0]
                                   chunk size for parent document. Zero means disabling parent document splitter.
-  [Option Group: query_rewriter]
-    Adaptor retrievers that rewrites original query.
-    [At most 1 of the following options are allowed]
-    Options:
-      --multi_query_retriever     Enable MultiQueryRetriever.
 [Option Group: 🔢 VectorStore]
   Options:
     --vector_table_dimension UINT:INT bounded to [1 - 8192] REQUIRED
@@ -76,7 +69,7 @@ Options:
 
 Subcommands:
 build
-  💼 Anaylize a single document and build database of learned context data. Proper values should be offered for Embedding model, Chat model, DocStore, VecStore and Retriever mentioned above.
+  💼 Analyze a single document and build database of learned context data. Proper values should be offered for Embedding model, Chat model, DocStore, VecStore and Retriever mentioned above.
   Options:
     --force                     A flag to force rebuild of database, which means existing db files will be deleted. Use this option with caution!
   [Option Group: Data source]
@@ -85,27 +78,34 @@ build
       -t,--type TEXT:{PDF,DOCX,MD,TXT,PARQUET} [TXT]
                                   File format of assigned document. Supported types are PDF,TXT,MD,DOCX,PARQUET
       --parquet_mapping TEXT      Mapping format for parquet columns. e.g. 1:t,2:m:parent_doc_id:int64,3:m:source:varchar.
+      --source_limit UINT [0]     Limit max entries from data source. It's supported only part of ingestors including PARQUET. Zero means no limit.
 
 serve
   💃 Start a OpenAI API compatible server with database of learned context. Proper values should be offered for Chat model, DocStore, VecStore and Retriever mentioned above.
   Options:
     -p,--port INT [9090]        Port number which API server will listen
 ```
 
-Following line is example command to embed a local PDF and start up OpenAI-like chat completion API server at `localhost:9090`.  
+Following line is example command to embed a local PDF and start up server with local Ollama service as model provider.  
 
 ```shell
 doc-agent --verbose \
-  --parent_child_retriever \
-  --chat_model_model_name=starling-lm:latest \
-  --embedding_model_model_name=all-minilm:latest \
-  --db_path=/tmp/doc_agent.db \
-  --vector_table_dimension=384 \
-  build \
+    --retriever_version=1 \
+    --child_chunk_size=200 \
+    --chat_model_provider=ollama \
+    --chat_model_model_name=mistral:latest \
+    --chat_model_endpoint=http://192.168.0.132/api/embeddings \
+    --embedding_model_provider=ollama \
+    --embedding_model_model_name=all-minilm:latest \
+    --embedding_model_endpoint=http://192.168.0.132/api/chat \
+    --db_path=/tmp/rag_eval_v1.db \
+    --vector_table_dimension=384 \
+build \
   --force \
   --file=attention_is_all_you_need.pdf \
-  --type=PDF \
-  serve \
+  --type=PARQUET \
+  --parquet_mapping=0:txt,1:metadata:source:varchar \
+serve \
   --port=9090
 ```
 

diff --git a/modules/instinct-apps/mini-assistant/README.md b/modules/instinct-apps/mini-assistant/README.md
@@ -55,9 +55,8 @@ assistant = client.beta.assistants.create(
 ## CLI Usage
 
 ```text
-mini-assistant --help-all
 🐬 mini-assistant - Local Assistant API at your service
-Usage: /IdeaProjects/instinct.cpp/build/modules/instinct-examples/mini-assistant/mini-assistant [OPTIONS]
+Usage: /Users/robinqu/Workspace/github/robinqu/instinct.cpp/build/Debug/modules/instinct-apps/mini-assistant/mini-assistant [OPTIONS]
 
 Options:
   -h,--help                   Print this help message and exit
@@ -67,48 +66,59 @@ Options:
                               Path for DuckDB database file.
   --file_store_path TEXT REQUIRED
                               Path for root directory of local object store. Will be created if it doesn't exist yet.
-  --agent_executor_type TEXT:{llm_compiler,openai_tool} [llm_compiler] 
+  --agent_executor_type TEXT:{llm_compiler,openai_tool} [llm_compiler]
                               Specify agent executor type. `llm_compiler` enables parallel function calling with opensourced models like mistral series and llama series, while `openai_tool` relies on official OpenAI function calling capability to direct agent workflow.
   -v,--verbose                A flag to enable verbose log
 [Option Group: chat_model]
   Configuration for chat model
   Options:
-    --chat_model_provider ENUM:value in {llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {4,3,2,1,0} [0] 
+    --chat_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0} [0]
                                 Specify chat model to use for chat completion.
     --chat_model_name TEXT      Specify chat model to use for chat completion. Default to gpt-3.5-turbo for OpenAI, llama3:8b for Ollama. Note that some model providers will ignore the passed model name and use the model currently loaded instead.
     --chat_model_api_key TEXT   API key for commercial services like OpenAI. Leave blank for services without ACL. API key is also retrieved from env variable named OPENAI_API_KEY.
-    --chat_model_host TEXT      Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
-    --chat_model_port INT       Port number for API service.
-    --chat_model_protocol ENUM:value in {http->1,https->2} OR {1,2}
-                                HTTP protocol for API service.
+    --chat_model_endpoint TEXT  Endpoint for chat model API, .e.g. 'https://api.openai.com/v1/chat_completion' for OpenAI.
 [Option Group: embedding_model]
   Configuration for embedding model
   Options:
-    --embedding_model_provider ENUM:value in {llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {4,3,2,1,0} [0] 
+    --embedding_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0} [0]
                                 Specify model to use for embedding.
     --embedding_model_name TEXT Specify model to use for embedding . Default to text-embedding-3-small for OpenAI, all-minilm:latest for Ollama. Note that some model providers will ignore the passed model name and use the model currently loaded instead.
     --embedding_model_dim INT:POSITIVE
                                 Dimension of given embedding model.
     --embedding_model_api_key TEXT
                                 API key for commercial services like OpenAI. Leave blank for services without ACL. API key is also retrieved from env variable named OPENAI_API_KEY.
-    --embedding_model_host TEXT Host name for API endpoint, .e.g. 'api.openai.com' for OpenAI.
-    --embedding_model_port INT  Port number for API service.
-    --embedding_model_protocol ENUM:value in {http->1,https->2} OR {1,2}
-                                HTTP protocol for API service.
+    --embedding_model_endpoint TEXT
+                                Endpoint for text embedding model API.
+[Option Group: ranking_model]
+  Currently only Jina.ai and local model are supported.
+  Options:
+    --reranker_model_provider ENUM:value in {jina_ai->5,llama_cpp->4,llm_studio->3,local->2,ollama->1,openai->0} OR {5,4,3,2,1,0}
+                                Specify reranker model provider to use.
+    --reranker_model_api_key TEXT
+                                API key for commercial services like Jina.ai. Leave blank for services without ACL.
+    --reranker_model_endpoint TEXT
+                                Endpoint for reranker model API.
+    --reranker_model_model_name TEXT
+                                Specify name of the model to be used.
 [Option Group: Options for LLMCompilerAgentExecutor]
   Options for LLMCompiler-based agent executor
   Options:
     --max_replan INT [6]        Max count for replan
-
 ```
 
 
-## Implementation details
+To run assistant api service with chat model served by  `llama.cpp` server and text embedding model served by `Ollama`:
+
+```
+mini-assistant \ 
+  --db_file_path /tmp/assistant_api.db \
+  --file_store_path /tmp/mini-assistant-files \
+  --agent_executor_type=llm_compiler \
+  --chat_model_provider=llama_cpp \
+  --chat_model_endpoint=http://192.168.0.134:8000/v1/chat/completions \
+  --embedding_model_provider=ollama \
+  --embedding_model_endpoint=http://192.168.0.134:31434/v1/embeddings \
+  --embedding_model_name=all-minilm:latest \
+  --verbose
+```
 
-* A thread pool based task scheduler is used to handle jobs for `run` objects.
-* DuckDB is used for convention structured data as well as vector data. Many improvements can be done. [More details](https://github.com/users/RobinQu/projects/1/views/1?pane=issue&itemId=62004973). 
-* More technical details about Assistant API can be found in [docs/assistant_api.md](../../../docs/assistant_api.md).
-* known issues:
-  * Function calling requires OpenAI's `gpt-3.5` or `gpt-4` series. Function calling with opensourced LLMs is possible, and it's on top of my TODO list.
-  * All timestamps are currently printed in microsecond precision while it's printed in epoch seconds in official APIs.
-  * Only function tool is supported. `file-search` is next to come. `code-interpreter` is scheduled as later.