Skip to content

Commit cc5ec74

Browse files
committed
Merge branch 'sprint/v0.1.5'
2 parents 5111dc1 + 6dc31cb commit cc5ec74

File tree

378 files changed

+5670
-4638
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

378 files changed

+5670
-4638
lines changed

.gitmodules

Lines changed: 0 additions & 3 deletions
This file was deleted.

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,24 @@
11
# Changelog
22

3+
## v0.1.5
4+
5+
**Full Changelog**: https://github.com/RobinQu/instinct.cpp/commits/v0.1.5
6+
7+
* Features
8+
* `instinct-transformer`: New bge-m3 embedding model. Generally speaking, bge-reranker and bge-embedding are still in preview as they are not fast enough for production.
9+
* `instinct-llm`: New `JinaRerankerModel` for Reranker model API from Jina.ai.
10+
* `instinct-retrieval`: New `DuckDBBM25Retriever` for BM25 keyword based retriever using DuckDB's built-in function.
11+
* Improvements
12+
* Move example code to standalone repository: [instinct-cpp-examples](https://github.com/RobinQu/instinct-cpp-examples).
13+
* Rename for all files for camel-case naming conventions
14+
* Build system:
15+
* Fix include paths for internal header files. Now all files are referenced using angle bracket pattern like `#include <instinct/...>`.
16+
* Rewrite Cmake install rules.
17+
* Run unit tests during `conan build` using `Ctest`.
18+
* `doc-agent`:
19+
* Use `retriver-version` argument in CLI to control how retriever related components are constructed.
20+
* Rewrite lifecycle control using application context
21+
* `instinct-retrieval`: Fix RAG evaluation. RAG pipeline with MultiPathRetriever should get score more than 80%.
322

423
## v0.1.4
524

CMakeLists.txt

Lines changed: 85 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,15 @@
11
cmake_minimum_required(VERSION 3.26)
22
project(instinct VERSION 0.1.0)
33

4-
option(BUILD_SHARED_LIBS "Build using shared libraries" OFF)
5-
6-
74
set(CMAKE_CXX_STANDARD 20)
85
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
96

10-
117
# force cache value to update when building with submodules
128
# https://cmake.org/cmake/help/latest/policy/CMP0077.html
139
set(CMAKE_POLICY_DEFAULT_CMP0077 NEW)
1410

1511
# show progress
16-
Set(FETCHCONTENT_QUIET FALSE)
12+
set(FETCHCONTENT_QUIET FALSE)
1713

1814
# specify default install location
1915
IF(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
@@ -24,25 +20,10 @@ ENDIF(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
2420
# see https://cmake.org/cmake/help/latest/module/GNUInstallDirs.html
2521
include(GNUInstallDirs)
2622

27-
find_package(Threads REQUIRED)
28-
29-
# add CTest
30-
include(CTest)
3123

3224
#add_compile_options(-fsanitize=address)
3325
#add_link_options(-fsanitize=address)
3426

35-
36-
# control where libraries and executables are placed during the build.
37-
# with the following settings executables are placed in <the top level of the
38-
# build tree>/bin and libraries/archives in <top level of the build tree>/lib.
39-
#set(CMAKE_LIBRARY_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_LIBDIR}")
40-
#set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_LIBDIR}")
41-
#set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/${CMAKE_INSTALL_BINDIR}")
42-
43-
# build position independent code.
44-
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
45-
4627
# disable C and C++ compiler extensions.
4728
set(CMAKE_C_EXTENSIONS OFF)
4829
set(CMAKE_CXX_EXTENSIONS OFF)
@@ -51,17 +32,7 @@ set(CMAKE_CXX_EXTENSIONS OFF)
5132
list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)
5233

5334

54-
option(BUILD_TESTING "Create tests using CMake" ON)
55-
option(BUILD_SHARED_LIBS "Build libraries as shared as opposed to static" ON)
56-
5735

58-
# enable RPATH support for installed binaries and libraries
59-
#include(AddInstallRPATHSupport)
60-
#add_install_rpath_support(
61-
# BIN_DIRS "${CMAKE_INSTALL_FULL_BINDIR}"
62-
# LIB_DIRS "${CMAKE_INSTALL_FULL_LIBDIR}"
63-
# INSTALL_NAME_DIR "${CMAKE_INSTALL_FULL_LIBDIR}"
64-
# USE_LINK_PATH)
6536

6637
# encourage user to specify a build type (e.g. Release, Debug, etc.), otherwise set it to Release.
6738
if(NOT CMAKE_CONFIGURATION_TYPES)
@@ -71,14 +42,33 @@ if(NOT CMAKE_CONFIGURATION_TYPES)
7142
endif()
7243
endif()
7344

45+
# add CTest
46+
include(CTest)
7447

48+
#add functions
49+
include(cmake/functions.cmake)
7550

76-
## gtest
77-
if(BUILD_TESTING)
78-
find_package(GTest REQUIRED)
79-
endif ()
51+
# add dependencies
52+
option(WITH_DUCKDB "Enable duckdb related classes" ON)
53+
option(WITH_EXPRTK "Enable exprtk for LLM math" ON)
54+
option(WITH_PDFIUM "Enable PDF parsing with PDFium" ON)
55+
option(WITH_DUCKX "Enable DOCX parsing with duckx" ON)
56+
include(cmake/conan_dependencies.cmake)
57+
58+
# compilation options
59+
option(BUILD_TESTING "Create tests using CMake" ON)
60+
option(BUILD_SHARED_LIBS "Build libraries as shared as opposed to static" ON)
8061

81-
include(cmake/CMakeRC.cmake)
62+
# print options before entering submodules
63+
message(STATUS "--------------------------------instinct-cpp--------------------------------------------------------")
64+
message(STATUS "CMAKE_BUILD_TYPE: " ${CMAKE_BUILD_TYPE})
65+
message(STATUS "BUILD_TESTING: " ${BUILD_TESTING})
66+
message(STATUS "BUILD_SHARED_LIBS: " ${BUILD_SHARED_LIBS})
67+
message(STATUS "WITH_DUCKDB: " ${WITH_DUCKDB})
68+
message(STATUS "WITH_EXPRTK: " ${WITH_EXPRTK})
69+
message(STATUS "WITH_PDFIUM: " ${WITH_PDFIUM})
70+
message(STATUS "WITH_DUCKX: " ${WITH_DUCKX})
71+
message(STATUS "----------------------------------------------------------------------------------------------------")
8272

8373
# project modules
8474
add_subdirectory(modules/instinct-proto)
@@ -90,7 +80,63 @@ add_subdirectory(modules/instinct-server)
9080
add_subdirectory(modules/instinct-data)
9181
add_subdirectory(modules/instinct-assistant)
9282

93-
# examples
94-
add_subdirectory(modules/instinct-examples/doc-agent)
95-
add_subdirectory(modules/instinct-examples/quick-start)
96-
add_subdirectory(modules/instinct-examples/mini-assistant)
83+
# apps
84+
add_subdirectory(modules/instinct-apps/doc-agent)
85+
add_subdirectory(modules/instinct-apps/mini-assistant)
86+
87+
88+
# write config version file
89+
include(CMakePackageConfigHelpers)
90+
write_basic_package_version_file(${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake
91+
VERSION ${PROJECT_VERSION}
92+
COMPATIBILITY SameMajorVersion)
93+
94+
# declare targets to be installed
95+
96+
list(APPEND EXPORTED_TARGETS proto core llm transformer data retrieval)
97+
if (TARGET instinct::assistant AND TARGET mini-assistant)
98+
list(APPEND EXPORTED_TARGETS mini-assistant)
99+
endif ()
100+
if (TARGET doc-agent)
101+
list(APPEND EXPORTED_TARGETS doc-agent)
102+
endif ()
103+
104+
install(TARGETS ${EXPORTED_TARGETS}
105+
EXPORT ${PROJECT_NAME}_Targets
106+
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
107+
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
108+
RUNTIME DESTINATION ${CMAKE_INSTALL_BINDIR}
109+
INCLUDES DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
110+
)
111+
112+
# install header files
113+
install(DIRECTORY ${PROJECT_BINARY_DIR}/modules/instinct-proto/
114+
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
115+
FILES_MATCHING PATTERN "*.h"
116+
)
117+
install(DIRECTORY
118+
${PROJECT_SOURCE_DIR}/modules/instinct-core/include/instinct
119+
${PROJECT_SOURCE_DIR}/modules/instinct-llm/include/instinct
120+
${PROJECT_SOURCE_DIR}/modules/instinct-transformer/include/instinct
121+
${PROJECT_SOURCE_DIR}/modules/instinct-data/include/instinct
122+
${PROJECT_SOURCE_DIR}/modules/instinct-retrieval/include/instinct
123+
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}
124+
)
125+
126+
# write target file to lib/instinct/cmake folder
127+
install(EXPORT ${PROJECT_NAME}_Targets
128+
FILE ${PROJECT_NAME}Targets.cmake
129+
NAMESPACE ${PROJECT_NAME}::
130+
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/cmake)
131+
132+
configure_package_config_file(
133+
"${PROJECT_SOURCE_DIR}/cmake/${PROJECT_NAME}Config.cmake.in"
134+
"${PROJECT_BINARY_DIR}/${PROJECT_NAME}Config.cmake"
135+
INSTALL_DESTINATION
136+
${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/cmake)
137+
138+
# copy config files to lib/instinct/cmake folder
139+
install(FILES
140+
"${PROJECT_BINARY_DIR}/${PROJECT_NAME}Config.cmake"
141+
"${PROJECT_BINARY_DIR}/${PROJECT_NAME}ConfigVersion.cmake"
142+
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/cmake)

README.md

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
`instinct.cpp` is a toolkit for developing LLM-powered applications.
44

5-
[![Discord](https://img.shields.io/badge/Discord%20Chat-purple?style=flat-square&logo=discord&logoColor=white&link=https%3A%2F%2Fdiscord.gg%2jnyqY9sbC)](https://discord.gg/2jnyqY9sbC) [![C++ 20](https://img.shields.io/badge/C%2B%2B-20-blue?style=flat-square&link=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FC%252B%252B20)](https://en.wikipedia.org/wiki/C%2B%2B20) [![License](https://img.shields.io/badge/Apache%20License-2.0-green?style=flat-square&logo=Apache&link=.%2FLICENSE)](./LICENSE)
5+
[![Discord](https://img.shields.io/badge/Discord%20Chat-purple?style=flat-square&logo=discord&logoColor=white&link=https%3A%2F%2Fdiscord.gg%2jnyqY9sbC)](https://discord.gg/2jnyqY9sbC) [![C++ 20](https://img.shields.io/badge/C%2B%2B-20-blue?style=flat-square&link=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FC%252B%252B20)](https://en.wikipedia.org/wiki/C%2B%2B20) [![License](https://img.shields.io/badge/Apache%20License-2.0-green?style=flat-square&logo=Apache&link=.%2FLICENSE)](./LICENSE) [![CI Build](https://github.com/RobinQu/instinct.cpp/actions/workflows/cmake-multi-platform.yml/badge.svg)](https://github.com/RobinQu/instinct.cpp/actions/workflows/cmake-multi-platform.yml)
66

77
**🚨 This project is under active development and has not reached to GA stage of first major release. See more at [Roadmap section](#roadmap).**
88

@@ -37,7 +37,7 @@ For library itself:
3737

3838
## Roadmap
3939

40-
Complete project plan is tracked at [Project kanban](https://github.com/users/RobinQu/projects/1/views/1?layout=board).
40+
Complete project plan is tracked at [Project kanban](https://github.com/users/RobinQu/projects/1/views/1).
4141

4242
| Milestone | Features | DDL |
4343
|--------------------------------------------------------------|--------------------------------------------------------------|---------------|
@@ -50,8 +50,41 @@ Complete project plan is tracked at [Project kanban](https://github.com/users/Ro
5050
| [v0.1.6](https://github.com/RobinQu/instinct.cpp/milestone/6) | `code-interpreter` in `mini-assistant` | 7.15 |
5151

5252

53+
Contributions are welcomed! You can join [discord server](https://discord.gg/2jnyqY9sbC), or contact me via [email](mailto:robinqu@gmail.com).
5354

5455

55-
56-
57-
Contributions are welcomed! You can join [discord server](https://discord.gg/2jnyqY9sbC), or contact me via [email](mailto:robinqu@gmail.com).
56+
# Acknowledgements
57+
58+
This project could not be possible without following awesome projects.
59+
60+
* [bshoshany-thread-pool](https://github.com/bshoshany/thread-pool)
61+
* [base64](https://github.com/aklomp/base64)
62+
* [chatllm.cpp](https://github.com/foldl/chatllm.cpp)
63+
* [concurrentqueue](https://github.com/cameron314/concurrentqueue)
64+
* [cpptrace](https://github.com/jeremy-rifkin/cpptrace)
65+
* [corssguid](https://github.com/graeme-hill/crossguid)
66+
* [cpp-httplib](https://github.com/yhirose/cpp-httplib)
67+
* [duckx](https://github.com/amiremohamadi/DuckX)
68+
* [DuckDB](https://duckdb.org/)
69+
* [exprtk](https://github.com/ArashPartow/exprtk)
70+
* [fmt](https://github.com/fmtlib/fmt)
71+
* [fmtlog](https://github.com/MengRao/fmtlog)
72+
* [hash_library](https://github.com/stbrumme/hash-library)
73+
* [icu](https://github.com/unicode-org/icu/)
74+
* [inja](https://github.com/pantor/inja)
75+
* [libcurl](https://curl.se/libcurl/c/)
76+
* [llama.cpp](https://github.com/ggerganov/llama.cpp/)
77+
* [nlohmann_json](https://github.com/nlohmann/json)
78+
* [protobuf](https://github.com/protocolbuffers/protobuf)
79+
* [pdfium](https://pdfium.googlesource.com/pdfium)
80+
* [reactiveplusplus](https://github.com/victimsnino/ReactivePlusPlus)
81+
* [tsl-ordered-map](https://github.com/Tessil/ordered-map)
82+
* [uniparser](https://uriparser.github.io/)
83+
84+
85+
And many thanks to the shared training checkpoints from:
86+
87+
* https://huggingface.co/BAAI/bge-m3
88+
* https://huggingface.co/BAAI/bge-reranker-v2-m3
89+
90+
**Lists are sorted alphabetically.**

0 commit comments

Comments
 (0)