|
liblloyal 1.0.0
Composable primitives for llama.cpp inference
|
KV Cache Management. More...
#include "common.hpp"#include "decoder.hpp"#include <cstdint>#include <llama/llama.h>#include <vector>Go to the source code of this file.
Classes | |
| struct | lloyal::kv::FileData |
| Data structure returned by read_file. More... | |
Namespaces | |
| namespace | lloyal |
| JSON Schema to Grammar Converter (Header-Only) | |
| namespace | lloyal::kv |
Functions | |
| bool | lloyal::kv::remove_range (llama_context *ctx, llama_seq_id seq, llama_pos p0, llama_pos p1) |
| Remove token range from KV cache sequence. | |
| llama_pos | lloyal::kv::pos_max (llama_context *ctx, llama_seq_id seq) |
| Get maximum position in KV cache sequence. | |
| void | lloyal::kv::seq_cp (llama_context *ctx, llama_seq_id src, llama_seq_id dst, llama_pos p0=0, llama_pos p1=-1) |
| Copy KV cache from one sequence to another. | |
| void | lloyal::kv::seq_keep (llama_context *ctx, llama_seq_id seq) |
| Keep only one sequence, removing all others. | |
| size_t | lloyal::kv::state_size (llama_context *ctx, llama_seq_id seq) |
| Get size needed to serialize sequence state. | |
| size_t | lloyal::kv::state_save (llama_context *ctx, llama_seq_id seq, uint8_t *dst, size_t size) |
| Save sequence state to buffer. | |
| size_t | lloyal::kv::state_load (llama_context *ctx, llama_seq_id seq, const uint8_t *src, size_t size) |
| Restore sequence state from buffer. | |
| size_t | lloyal::kv::global_state_size (llama_context *ctx) |
| Get size needed to serialize global state. | |
| size_t | lloyal::kv::global_state_save (llama_context *ctx, uint8_t *dst, size_t size) |
| Save global state to buffer. | |
| size_t | lloyal::kv::global_state_load (llama_context *ctx, const uint8_t *src, size_t size) |
| Restore global state from buffer. | |
| void | lloyal::kv::log_build_info (llama_context *ctx) |
| Log KV cache build info and current state. | |
| void | lloyal::kv::clear_all (llama_context *ctx) |
| Clear all KV cache (complete reset) | |
| void | lloyal::kv::clear_metadata (llama_context *ctx) |
| Clear KV cache metadata only (fast reset) | |
| void | lloyal::kv::clear_and_reseed (llama_context *ctx, const std::vector< llama_token > &original_sinks, const std::vector< llama_token > &tail, int32_t n_batch) |
| size_t | lloyal::kv::write_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath, const std::vector< llama_token > &tokens) |
| Write KV state to file with self-describing format. | |
| FileData | lloyal::kv::read_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath) |
KV Cache Management.
Core primitives for KV cache operations in LLM applications:
These primitives compose into diverse inference patterns including:
Memory management for llama.cpp primitives:
Definition in file kv.hpp.