liblloyal 1.0.0
Composable primitives for llama.cpp inference
Loading...
Searching...
No Matches
kv.hpp File Reference

KV Cache Management. More...

#include "common.hpp"
#include "decoder.hpp"
#include <cstdint>
#include <llama/llama.h>
#include <vector>

Go to the source code of this file.

Classes

struct  lloyal::kv::FileData
 Data structure returned by read_file. More...
 

Namespaces

namespace  lloyal
 JSON Schema to Grammar Converter (Header-Only)
 
namespace  lloyal::kv
 

Functions

bool lloyal::kv::remove_range (llama_context *ctx, llama_seq_id seq, llama_pos p0, llama_pos p1)
 Remove token range from KV cache sequence.
 
llama_pos lloyal::kv::pos_max (llama_context *ctx, llama_seq_id seq)
 Get maximum position in KV cache sequence.
 
void lloyal::kv::seq_cp (llama_context *ctx, llama_seq_id src, llama_seq_id dst, llama_pos p0=0, llama_pos p1=-1)
 Copy KV cache from one sequence to another.
 
void lloyal::kv::seq_keep (llama_context *ctx, llama_seq_id seq)
 Keep only one sequence, removing all others.
 
size_t lloyal::kv::state_size (llama_context *ctx, llama_seq_id seq)
 Get size needed to serialize sequence state.
 
size_t lloyal::kv::state_save (llama_context *ctx, llama_seq_id seq, uint8_t *dst, size_t size)
 Save sequence state to buffer.
 
size_t lloyal::kv::state_load (llama_context *ctx, llama_seq_id seq, const uint8_t *src, size_t size)
 Restore sequence state from buffer.
 
size_t lloyal::kv::global_state_size (llama_context *ctx)
 Get size needed to serialize global state.
 
size_t lloyal::kv::global_state_save (llama_context *ctx, uint8_t *dst, size_t size)
 Save global state to buffer.
 
size_t lloyal::kv::global_state_load (llama_context *ctx, const uint8_t *src, size_t size)
 Restore global state from buffer.
 
void lloyal::kv::log_build_info (llama_context *ctx)
 Log KV cache build info and current state.
 
void lloyal::kv::clear_all (llama_context *ctx)
 Clear all KV cache (complete reset)
 
void lloyal::kv::clear_metadata (llama_context *ctx)
 Clear KV cache metadata only (fast reset)
 
void lloyal::kv::clear_and_reseed (llama_context *ctx, const std::vector< llama_token > &original_sinks, const std::vector< llama_token > &tail, int32_t n_batch)
 
size_t lloyal::kv::write_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath, const std::vector< llama_token > &tokens)
 Write KV state to file with self-describing format.
 
FileData lloyal::kv::read_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath)
 

Detailed Description

KV Cache Management.

Core primitives for KV cache operations in LLM applications:

  • Multi-sequence management: independent recurrent states per seq_id
  • Cache lifecycle: clear, remove, copy, keep operations
  • State persistence: save/load with fragmentation fallback
  • Cache reconstruction: clear_and_reseed for context compression strategies
  • File I/O: session save/resume for app lifecycle management

These primitives compose into diverse inference patterns including:

  • Context window management (streaming, compression, eviction)
  • Session persistence (save/resume across app restarts)
  • Multi-sequence orchestration (parallel logical states)
  • Specialized search and sampling strategies

Memory management for llama.cpp primitives:

  • llama_memory_* for cache lifecycle and multi-sequence ops
  • llama_state_* for serialization with fragmentation fallback
  • Adds null-safety, error handling, and defensive programming

Definition in file kv.hpp.