Skip to content

Commit

Permalink
Merge branch 'sprint/v0.1.5'
Browse files Browse the repository at this point in the history
  • Loading branch information
RobinQu committed Jul 1, 2024
2 parents 5111dc1 + 6dc31cb commit cc5ec74
Show file tree
Hide file tree
Showing 378 changed files with 5,670 additions and 4,638 deletions.
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
# Changelog

## v0.1.5

**Full Changelog**: https://github.com/RobinQu/instinct.cpp/commits/v0.1.5

* Features
* `instinct-transformer`: New bge-m3 embedding model. Generally speaking, bge-reranker and bge-embedding are still in preview as they are not fast enough for production.
* `instinct-llm`: New `JinaRerankerModel` for Reranker model API from Jina.ai.
* `instinct-retrieval`: New `DuckDBBM25Retriever` for BM25 keyword based retriever using DuckDB's built-in function.
* Improvements
* Move example code to standalone repository: [instinct-cpp-examples](https://github.com/RobinQu/instinct-cpp-examples).
* Rename for all files for camel-case naming conventions
* Build system:
* Fix include paths for internal header files. Now all files are referenced using angle bracket pattern like `#include <instinct/...>`.
* Rewrite Cmake install rules.
* Run unit tests during `conan build` using `Ctest`.
* `doc-agent`:
* Use `retriver-version` argument in CLI to control how retriever related components are constructed.
* Rewrite lifecycle control using application context
* `instinct-retrieval`: Fix RAG evaluation. RAG pipeline with MultiPathRetriever should get score more than 80%.

## v0.1.4

Expand Down
124 changes: 85 additions & 39 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
cmake_minimum_required(VERSION 3.26)
project(instinct VERSION 0.1.0)

option(BUILD_SHARED_LIBS "Build using shared libraries" OFF)


set(CMAKE_CXX_STANDARD 20)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)


# force cache value to update when building with submodules
# https://cmake.org/cmake/help/latest/policy/CMP0077.html
set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)

# show progress
Set(FETCHCONTENT_QUIET FALSE)
set(FETCHCONTENT_QUIET FALSE)

# specify default install location
IF(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
Expand All @@ -24,25 +20,10 @@ ENDIF(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
# see https://cmake.org/cmake/help/latest/module/GNUInstallDirs.html
include(GNUInstallDirs)

find_package(Threads REQUIRED)

# add CTest
include(CTest)

#add_compile_options(-fsanitize=address)
#add_link_options(-fsanitize=address)


# control where libraries and executables are placed during the build.
# with the following settings executables are placed in <the top level of the
# build tree>/bin and libraries/archives in <top level of the build tree>/lib.
#set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_LIBDIR}")
#set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_LIBDIR}")
#set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_BINDIR}")

# build position independent code.
set(CMAKE_POSITION_INDEPENDENT_CODE ON)

# disable C and C++ compiler extensions.
set(CMAKE_C_EXTENSIONS OFF)
set(CMAKE_CXX_EXTENSIONS OFF)
Expand All @@ -51,17 +32,7 @@ set(CMAKE_CXX_EXTENSIONS OFF)
list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)


option(BUILD_TESTING "Create tests using CMake" ON)
option(BUILD_SHARED_LIBS "Build libraries as shared as opposed to static" ON)


# enable RPATH support for installed binaries and libraries
#include(AddInstallRPATHSupport)
#add_install_rpath_support(
# BIN_DIRS "${CMAKE_INSTALL_FULL_BINDIR}"
# LIB_DIRS "${CMAKE_INSTALL_FULL_LIBDIR}"
# INSTALL_NAME_DIR "${CMAKE_INSTALL_FULL_LIBDIR}"
# USE_LINK_PATH)

# encourage user to specify a build type (e.g. Release, Debug, etc.), otherwise set it to Release.
if(NOT CMAKE_CONFIGURATION_TYPES)
Expand All @@ -71,14 +42,33 @@ if(NOT CMAKE_CONFIGURATION_TYPES)
endif()
endif()

# add CTest
include(CTest)

#add functions
include(cmake/functions.cmake)

## gtest
if(BUILD_TESTING)
find_package(GTest REQUIRED)
endif ()
# add dependencies
option(WITH_DUCKDB "Enable duckdb related classes" ON)
option(WITH_EXPRTK "Enable exprtk for LLM math" ON)
option(WITH_PDFIUM "Enable PDF parsing with PDFium" ON)
option(WITH_DUCKX "Enable DOCX parsing with duckx" ON)
include(cmake/conan_dependencies.cmake)

# compilation options
option(BUILD_TESTING "Create tests using CMake" ON)
option(BUILD_SHARED_LIBS "Build libraries as shared as opposed to static" ON)

include(cmake/CMakeRC.cmake)
# print options before entering submodules
message(STATUS "--------------------------------instinct-cpp--------------------------------------------------------")
message(STATUS "CMAKE_BUILD_TYPE: " ${CMAKE_BUILD_TYPE})
message(STATUS "BUILD_TESTING: " ${BUILD_TESTING})
message(STATUS "BUILD_SHARED_LIBS: " ${BUILD_SHARED_LIBS})
message(STATUS "WITH_DUCKDB: " ${WITH_DUCKDB})
message(STATUS "WITH_EXPRTK: " ${WITH_EXPRTK})
message(STATUS "WITH_PDFIUM: " ${WITH_PDFIUM})
message(STATUS "WITH_DUCKX: " ${WITH_DUCKX})
message(STATUS "----------------------------------------------------------------------------------------------------")

# project modules
add_subdirectory(modules/instinct-proto)
Expand All @@ -90,7 +80,63 @@ add_subdirectory(modules/instinct-server)
add_subdirectory(modules/instinct-data)
add_subdirectory(modules/instinct-assistant)

# examples
add_subdirectory(modules/instinct-examples/doc-agent)
add_subdirectory(modules/instinct-examples/quick-start)
add_subdirectory(modules/instinct-examples/mini-assistant)
# apps
add_subdirectory(modules/instinct-apps/doc-agent)
add_subdirectory(modules/instinct-apps/mini-assistant)


# write config version file
include(CMakePackageConfigHelpers)
write_basic_package_version_file(${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake
VERSION ${PROJECT_VERSION}
COMPATIBILITY SameMajorVersion)

# declare targets to be installed

list(APPEND EXPORTED_TARGETS proto core llm transformer data retrieval)
if (TARGET instinct::assistant AND TARGET mini-assistant)
list(APPEND EXPORTED_TARGETS mini-assistant)
endif ()
if (TARGET doc-agent)
list(APPEND EXPORTED_TARGETS doc-agent)
endif ()

install(TARGETS ${EXPORTED_TARGETS}
EXPORT ${PROJECT_NAME}_Targets
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
INCLUDES DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
)

# install header files
install(DIRECTORY ${PROJECT_BINARY_DIR}/modules/instinct-proto/
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
FILES_MATCHING PATTERN "*.h"
)
install(DIRECTORY
${PROJECT_SOURCE_DIR}/modules/instinct-core/include/instinct
${PROJECT_SOURCE_DIR}/modules/instinct-llm/include/instinct
${PROJECT_SOURCE_DIR}/modules/instinct-transformer/include/instinct
${PROJECT_SOURCE_DIR}/modules/instinct-data/include/instinct
${PROJECT_SOURCE_DIR}/modules/instinct-retrieval/include/instinct
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
)

# write target file to lib/instinct/cmake folder
install(EXPORT ${PROJECT_NAME}_Targets
FILE ${PROJECT_NAME}Targets.cmake
NAMESPACE ${PROJECT_NAME}::
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/cmake)

configure_package_config_file(
"${PROJECT_SOURCE_DIR}/cmake/${PROJECT_NAME}Config.cmake.in"
"${PROJECT_BINARY_DIR}/${PROJECT_NAME}Config.cmake"
INSTALL_DESTINATION
${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/cmake)

# copy config files to lib/instinct/cmake folder
install(FILES
"${PROJECT_BINARY_DIR}/${PROJECT_NAME}Config.cmake"
"${PROJECT_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake"
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/cmake)
43 changes: 38 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

`instinct.cpp` is a toolkit for developing LLM-powered applications.

[![Discord](https://img.shields.io/badge/Discord%20Chat-purple?style=flat-square&logo=discord&logoColor=white&link=https%3A%2F%2Fdiscord.gg%2jnyqY9sbC)](https://discord.gg/2jnyqY9sbC) [![C++ 20](https://img.shields.io/badge/C%2B%2B-20-blue?style=flat-square&link=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FC%252B%252B20)](https://en.wikipedia.org/wiki/C%2B%2B20) [![License](https://img.shields.io/badge/Apache%20License-2.0-green?style=flat-square&logo=Apache&link=.%2FLICENSE)](./LICENSE)
[![Discord](https://img.shields.io/badge/Discord%20Chat-purple?style=flat-square&logo=discord&logoColor=white&link=https%3A%2F%2Fdiscord.gg%2jnyqY9sbC)](https://discord.gg/2jnyqY9sbC) [![C++ 20](https://img.shields.io/badge/C%2B%2B-20-blue?style=flat-square&link=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FC%252B%252B20)](https://en.wikipedia.org/wiki/C%2B%2B20) [![License](https://img.shields.io/badge/Apache%20License-2.0-green?style=flat-square&logo=Apache&link=.%2FLICENSE)](./LICENSE) [![CI Build](https://github.com/RobinQu/instinct.cpp/actions/workflows/cmake-multi-platform.yml/badge.svg)](https://github.com/RobinQu/instinct.cpp/actions/workflows/cmake-multi-platform.yml)

**🚨 This project is under active development and has not reached to GA stage of first major release. See more at [Roadmap section](#roadmap).**

Expand Down Expand Up @@ -37,7 +37,7 @@ For library itself:

## Roadmap

Complete project plan is tracked at [Project kanban](https://github.com/users/RobinQu/projects/1/views/1?layout=board).
Complete project plan is tracked at [Project kanban](https://github.com/users/RobinQu/projects/1/views/1).

| Milestone | Features | DDL |
|--------------------------------------------------------------|--------------------------------------------------------------|---------------|
Expand All @@ -50,8 +50,41 @@ Complete project plan is tracked at [Project kanban](https://github.com/users/Ro
| [v0.1.6](https://github.com/RobinQu/instinct.cpp/milestone/6) | `code-interpreter` in `mini-assistant` | 7.15 |


Contributions are welcomed! You can join [discord server](https://discord.gg/2jnyqY9sbC), or contact me via [email](mailto:[email protected]).




Contributions are welcomed! You can join [discord server](https://discord.gg/2jnyqY9sbC), or contact me via [email](mailto:[email protected]).
# Acknowledgements

This project could not be possible without following awesome projects.

* [bshoshany-thread-pool](https://github.com/bshoshany/thread-pool)
* [base64](https://github.com/aklomp/base64)
* [chatllm.cpp](https://github.com/foldl/chatllm.cpp)
* [concurrentqueue](https://github.com/cameron314/concurrentqueue)
* [cpptrace](https://github.com/jeremy-rifkin/cpptrace)
* [corssguid](https://github.com/graeme-hill/crossguid)
* [cpp-httplib](https://github.com/yhirose/cpp-httplib)
* [duckx](https://github.com/amiremohamadi/DuckX)
* [DuckDB](https://duckdb.org/)
* [exprtk](https://github.com/ArashPartow/exprtk)
* [fmt](https://github.com/fmtlib/fmt)
* [fmtlog](https://github.com/MengRao/fmtlog)
* [hash_library](https://github.com/stbrumme/hash-library)
* [icu](https://github.com/unicode-org/icu/)
* [inja](https://github.com/pantor/inja)
* [libcurl](https://curl.se/libcurl/c/)
* [llama.cpp](https://github.com/ggerganov/llama.cpp/)
* [nlohmann_json](https://github.com/nlohmann/json)
* [protobuf](https://github.com/protocolbuffers/protobuf)
* [pdfium](https://pdfium.googlesource.com/pdfium)
* [reactiveplusplus](https://github.com/victimsnino/ReactivePlusPlus)
* [tsl-ordered-map](https://github.com/Tessil/ordered-map)
* [uniparser](https://uriparser.github.io/)


And many thanks to the shared training checkpoints from:

* https://huggingface.co/BAAI/bge-m3
* https://huggingface.co/BAAI/bge-reranker-v2-m3

**Lists are sorted alphabetically.**
Loading

0 comments on commit cc5ec74

Please sign in to comment.