-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the query API of llm cache and use vector<uint8_t> as payload object. #1797
Conversation
f013b4f
to
1edb37c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client &client
could be a member of the KVStateCacheBuilder
(as well as KVStateCacheBlockBuilder
). Don't given every APIs a Client &client
argument.
You can image that user may pass different client object to the same KVStateCacheBuilder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename K_STATE
and V_STATE
to LLMKV
, or llm_kv_t
.
@@ -27,7 +27,7 @@ limitations under the License. | |||
|
|||
using namespace vineyard; // NOLINT(build/namespaces) | |||
|
|||
#define DIMENSION 100 | |||
#define TENSORBYTES 800 | |||
#define CAPACITY 1000 | |||
#define LAYER 64 | |||
#define BLOCK_SIZE 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use constexpr int
or constexpr size_t
for constants in C++.
114fe52
to
a359b88
Compare
663c7a5
to
ec2eab6
Compare
…ad object. * Replace the alias of KV_STATE_WITH_LAYER with std::map<int, std::pair<K_STATE, V_STATE>>. * Use the references of std::vector<T> to avoid copying. * Print the rax tree to a string for debugging. * Replace the alias of KV_STATE_WITH_LAYER with std::map<int, std::pair<K_STATE, V_STATE>>. * Rename the Dimension with TensorBytes. Signed-off-by: Ye Cao <[email protected]>
ec2eab6
to
45acb82
Compare
What do these changes do?
query
API, users only input a token list and will get the kv_cache with the longest prefix.Dimension
withTensorBytes
.std::vector<T>
to avoid copying.Related issue number
std::vector<T>
andstd::shared_ptr<T>
#1786vector<uint8_t>
as payload object and remove all assumption about double #1795