|
liblloyal 1.0.0
Branched Inference for llama.cpp
|
KV Cache Physics. More...
#include "common.hpp"#include "decode.hpp"#include <cassert>#include <cstdint>#include <llama/llama.h>#include <type_traits>#include <vector>Go to the source code of this file.
Classes | |
| struct | lloyal::kv::tenancy::State |
| Tenancy state — tracks seq_id vacancy and leases. More... | |
| struct | lloyal::kv::FileData |
| Data structure returned by read_file. More... | |
Namespaces | |
| namespace | lloyal |
| Boundary Tracker Stub for OSS liblloyal. | |
| namespace | lloyal::kv |
| namespace | lloyal::kv::cache_type |
| namespace | lloyal::kv::tenancy |
Functions | |
| ggml_type | lloyal::kv::cache_type::from_str (const std::string &s) |
Map string name to ggml_type enum (matches llama.cpp CLI -ctk/-ctv flags). | |
| bool | lloyal::kv::remove_range (llama_context *ctx, llama_seq_id seq, llama_pos p0, llama_pos p1) |
| Remove token range from KV cache sequence. | |
| llama_pos | lloyal::kv::pos_max (llama_context *ctx, llama_seq_id seq) |
| Get maximum position in KV cache sequence. | |
| void | lloyal::kv::seq_cp (llama_context *ctx, llama_seq_id src, llama_seq_id dst, llama_pos p0=0, llama_pos p1=-1) |
| Copy KV cache from one sequence to another. | |
| void | lloyal::kv::seq_keep (llama_context *ctx, llama_seq_id seq) |
| Keep only one sequence, removing all others. | |
| State | lloyal::kv::tenancy::init (llama_context *ctx, llama_seq_id n_seq_max) |
| Initialize tenancy with all seq_ids vacant. | |
| llama_seq_id | lloyal::kv::tenancy::acquire (State &s) |
| Acquire a seq_id from the vacant pool. | |
| void | lloyal::kv::tenancy::release (State &s, llama_seq_id seq) |
| Release a seq_id back to vacant — bookkeeping only, no KV calls. | |
| void | lloyal::kv::tenancy::evict (State &s, llama_seq_id seq) |
| Evict a seq_id — strip all KV tags then release. | |
| void | lloyal::kv::tenancy::retain (State &s, llama_seq_id keep) |
| Nuclear retain — keep one seq, rebuild vacancy from scratch. | |
| void | lloyal::kv::tenancy::evict_all (State &s) |
| Evict every leased seq_id. | |
| size_t | lloyal::kv::tenancy::available (const State &s) |
| Number of vacant seq_ids available for acquisition. | |
| size_t | lloyal::kv::state_size (llama_context *ctx, llama_seq_id seq) |
| Get size needed to serialize sequence state. | |
| size_t | lloyal::kv::state_save (llama_context *ctx, llama_seq_id seq, uint8_t *dst, size_t size) |
| Save sequence state to buffer. | |
| size_t | lloyal::kv::state_load (llama_context *ctx, llama_seq_id seq, const uint8_t *src, size_t size) |
| Restore sequence state from buffer. | |
| size_t | lloyal::kv::global_state_size (llama_context *ctx) |
| Get size needed to serialize global state. | |
| size_t | lloyal::kv::global_state_save (llama_context *ctx, uint8_t *dst, size_t size) |
| Save global state to buffer. | |
| size_t | lloyal::kv::global_state_load (llama_context *ctx, const uint8_t *src, size_t size) |
| Restore global state from buffer. | |
| void | lloyal::kv::log_build_info (llama_context *ctx) |
| Log KV cache build info and current state. | |
| void | lloyal::kv::clear_all (llama_context *ctx) |
| Clear all KV cache (complete reset) | |
| void | lloyal::kv::clear_metadata (llama_context *ctx) |
| Clear KV cache metadata only (fast reset) | |
| void | lloyal::kv::clear_and_reseed (llama_context *ctx, const std::vector< llama_token > &original_sinks, const std::vector< llama_token > &tail, int32_t n_batch) |
| size_t | lloyal::kv::write_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath, const std::vector< llama_token > &tokens) |
| Write KV state to file with self-describing format. | |
| FileData | lloyal::kv::read_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath) |
Variables | |
| constexpr llama_seq_id | lloyal::kv::NO_LEASE = static_cast<llama_seq_id>(-1) |
| Sentinel value indicating a branch has no KV residency. | |
KV Cache Physics.
Two layers of KV cache management:
Tenancy — the seq_id vacancy manager (the real logic in this file). Tracks which sequences are leased (owned by a branch) and which are vacant, enforcing the invariant that vacant seq_ids are always clean (no KV tags). See KV Tenancy for the full API.
Primitives — thin wrappers over llama.cpp's llama_memory_* and llama_state_* APIs. These add null-safety, error handling, debug logging, and fragmentation fallbacks, but contain no domain logic:
Definition in file kv.hpp.