liblloyal 1.0.0
Composable primitives for llama.cpp inference
Loading...
Searching...
No Matches
lloyal::kv Namespace Reference

Classes

struct  FileData
 Data structure returned by read_file. More...
 

Functions

bool remove_range (llama_context *ctx, llama_seq_id seq, llama_pos p0, llama_pos p1)
 Remove token range from KV cache sequence.
 
llama_pos pos_max (llama_context *ctx, llama_seq_id seq)
 Get maximum position in KV cache sequence.
 
void seq_cp (llama_context *ctx, llama_seq_id src, llama_seq_id dst, llama_pos p0=0, llama_pos p1=-1)
 Copy KV cache from one sequence to another.
 
void seq_keep (llama_context *ctx, llama_seq_id seq)
 Keep only one sequence, removing all others.
 
size_t state_size (llama_context *ctx, llama_seq_id seq)
 Get size needed to serialize sequence state.
 
size_t state_save (llama_context *ctx, llama_seq_id seq, uint8_t *dst, size_t size)
 Save sequence state to buffer.
 
size_t state_load (llama_context *ctx, llama_seq_id seq, const uint8_t *src, size_t size)
 Restore sequence state from buffer.
 
size_t global_state_size (llama_context *ctx)
 Get size needed to serialize global state.
 
size_t global_state_save (llama_context *ctx, uint8_t *dst, size_t size)
 Save global state to buffer.
 
size_t global_state_load (llama_context *ctx, const uint8_t *src, size_t size)
 Restore global state from buffer.
 
void log_build_info (llama_context *ctx)
 Log KV cache build info and current state.
 
void clear_all (llama_context *ctx)
 Clear all KV cache (complete reset)
 
void clear_metadata (llama_context *ctx)
 Clear KV cache metadata only (fast reset)
 
void clear_and_reseed (llama_context *ctx, const std::vector< llama_token > &original_sinks, const std::vector< llama_token > &tail, int32_t n_batch)
 
size_t write_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath, const std::vector< llama_token > &tokens)
 Write KV state to file with self-describing format.
 
FileData read_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath)
 

Function Documentation

◆ clear_all()

void lloyal::kv::clear_all ( llama_context *  ctx)
inline

Clear all KV cache (complete reset)

Clears both metadata and data buffers for a complete cache reset. Use when starting a new conversation or session.

Parameters
ctxLlama context (must not be null)
Exceptions
std::runtime_errorif ctx is null
Note
Uses llama_memory_clear(mem, true) which:
  • Clears metadata (cell positions, sequence heads)
  • Zeroes K/V tensor data buffers
  • Complete reset (slower than clear_metadata())
See also
clear_metadata() for faster metadata-only clearing
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 460 of file kv.hpp.

◆ clear_and_reseed()

void lloyal::kv::clear_and_reseed ( llama_context *  ctx,
const std::vector< llama_token > &  original_sinks,
const std::vector< llama_token > &  tail,
int32_t  n_batch 
)
inline

◆ clear_metadata()

void lloyal::kv::clear_metadata ( llama_context *  ctx)
inline

Clear KV cache metadata only (fast reset)

Clears logical structure but keeps buffer allocations. Faster than clear_all() for compression patterns.

Parameters
ctxLlama context (must not be null)
Exceptions
std::runtime_errorif ctx is null
Note
Performance: Faster than clear_all() (no buffer zeroing) Use when immediately re-decoding; buffer reuse reduces overhead
See also
clear_all() for complete reset including data buffers
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 485 of file kv.hpp.

◆ global_state_load()

size_t lloyal::kv::global_state_load ( llama_context *  ctx,
const uint8_t *  src,
size_t  size 
)
inline

Restore global state from buffer.

Deserializes and restores the entire context's state from buffer.

Parameters
ctxLlama context (must not be null)
srcSource buffer (must not be null)
sizeBuffer size in bytes
Returns
Bytes read, or 0 on failure
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 380 of file kv.hpp.

◆ global_state_save()

size_t lloyal::kv::global_state_save ( llama_context *  ctx,
uint8_t *  dst,
size_t  size 
)
inline

Save global state to buffer.

Serializes the entire context's state into the provided buffer.

Parameters
ctxLlama context (must not be null)
dstDestination buffer (must not be null)
sizeBuffer size in bytes
Returns
Bytes written, or 0 on failure
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 358 of file kv.hpp.

◆ global_state_size()

size_t lloyal::kv::global_state_size ( llama_context *  ctx)
inline

Get size needed to serialize global state.

Returns buffer size required to save the entire context's state. Use when per-sequence serialization is not needed.

Parameters
ctxLlama context (must not be null)
Returns
Required buffer size in bytes, or 0 if context is null
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 336 of file kv.hpp.

◆ log_build_info()

void lloyal::kv::log_build_info ( llama_context *  ctx)
inline

Log KV cache build info and current state.

Outputs debug information about the KV cache configuration and current state. Useful for debugging and understanding cache behavior.

Parameters
ctxLlama context (can be null; limits output if null)
Note
Only produces output when DEBUG logging is enabled
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 405 of file kv.hpp.

◆ pos_max()

llama_pos lloyal::kv::pos_max ( llama_context *  ctx,
llama_seq_id  seq 
)
inline

Get maximum position in KV cache sequence.

Returns the highest token position in the specified sequence's KV cache. For a sequence with N tokens, this returns N-1 (zero-indexed).

Parameters
ctxLlama context (must not be null)
seqSequence ID
Returns
Maximum position (number of tokens - 1), or -1 if empty or context is null
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 87 of file kv.hpp.

◆ read_file()

FileData lloyal::kv::read_file ( llama_context *  ctx,
llama_seq_id  seq,
const std::string &  filepath 
)
inline

◆ remove_range()

bool lloyal::kv::remove_range ( llama_context *  ctx,
llama_seq_id  seq,
llama_pos  p0,
llama_pos  p1 
)
inline

Remove token range from KV cache sequence.

Removes tokens in the range [p0, p1) from the specified sequence's KV cache. Used for selective eviction in context window management.

Parameters
ctxLlama context (must not be null)
seqSequence ID (use 0 for single-sequence mode)
p0Start position (inclusive)
p1End position (exclusive), use -1 for "to end"
Returns
true if successful, false if context is null or operation failed
Warning
CRITICAL: Call this BEFORE next llama_decode(), not after. Calling after decode may cause undefined behavior.
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 54 of file kv.hpp.

◆ seq_cp()

void lloyal::kv::seq_cp ( llama_context *  ctx,
llama_seq_id  src,
llama_seq_id  dst,
llama_pos  p0 = 0,
llama_pos  p1 = -1 
)
inline

Copy KV cache from one sequence to another.

Copies KV cache state from source to destination sequence, enabling efficient branching without duplicating model weights.

Parameters
ctxLlama context (must not be null)
srcSource sequence ID
dstDestination sequence ID
p0Start position (inclusive), default 0
p1End position (exclusive), default -1 for "to end"
Note
Use case: Multi-sequence search (fork from trunk without copying model weights)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 114 of file kv.hpp.

◆ seq_keep()

void lloyal::kv::seq_keep ( llama_context *  ctx,
llama_seq_id  seq 
)
inline

Keep only one sequence, removing all others.

Removes all sequences except the specified one from the KV cache. Efficient way to prune unused branches.

Parameters
ctxLlama context (must not be null)
seqSequence ID to keep
Note
Use case: After selection, prune all alternatives except chosen path
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 138 of file kv.hpp.

◆ state_load()

size_t lloyal::kv::state_load ( llama_context *  ctx,
llama_seq_id  seq,
const uint8_t *  src,
size_t  size 
)
inline

Restore sequence state from buffer.

Deserializes KV cache state from buffer and restores it to the sequence. Automatically falls back to global state restore if per-sequence restore fails (may occur with fragmented caches).

Parameters
ctxLlama context (must not be null)
seqSequence ID
srcSource buffer (must not be null)
sizeBuffer size in bytes
Returns
Bytes read, or 0 on failure
Warning
May crash on recurrent models if KV cache is empty during load
Note
Fallback strategy: per-sequence → global state (handles fragmentation)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 281 of file kv.hpp.

◆ state_save()

size_t lloyal::kv::state_save ( llama_context *  ctx,
llama_seq_id  seq,
uint8_t *  dst,
size_t  size 
)
inline

Save sequence state to buffer.

Serializes the sequence's KV cache state into the provided buffer. Automatically falls back to global state save if per-sequence save fails (may occur with fragmented caches).

Parameters
ctxLlama context (must not be null)
seqSequence ID
dstDestination buffer (must not be null)
sizeBuffer size in bytes
Returns
Bytes written, or 0 on failure
Note
Fallback strategy: per-sequence → global state (handles fragmentation)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 220 of file kv.hpp.

◆ state_size()

size_t lloyal::kv::state_size ( llama_context *  ctx,
llama_seq_id  seq 
)
inline

Get size needed to serialize sequence state.

Returns buffer size required to save the sequence's KV cache state. Automatically falls back to global state size if per-sequence query fails (may occur with fragmented caches).

Parameters
ctxLlama context (must not be null)
seqSequence ID
Returns
Required buffer size in bytes, or 0 if empty/failed
Note
Fallback strategy: per-sequence → global state (handles fragmentation)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 165 of file kv.hpp.

◆ write_file()

size_t lloyal::kv::write_file ( llama_context *  ctx,
llama_seq_id  seq,
const std::string &  filepath,
const std::vector< llama_token > &  tokens 
)
inline

Write KV state to file with self-describing format.

Serializes KV cache state to file using llama.cpp's standard format:

  • Magic + Version (validation)
  • Token count + Token array
  • KV state data (cache + logits + embeddings)
Parameters
ctxLlama context (must not be null)
seqSequence ID (use 0 for single-sequence mode)
filepathDestination file path (must not be empty)
tokensToken IDs to include in file
Returns
Bytes written, or 0 on failure
Note
Use cases:
  • Exit/resume: Save app state across restarts
  • Persistent sessions: Multiple save points per conversation
  • Context sharing: Serialize → upload → share
Warning
Skips write if KV cache is empty (returns 0)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 625 of file kv.hpp.