Namespaces
namespace	cache_type

namespace	tenancy

Classes
struct	FileData
	Data structure returned by read_file. More...

Functions
bool	remove_range (llama_context *ctx, llama_seq_id seq, llama_pos p0, llama_pos p1)
	Remove token range from KV cache sequence.

llama_pos	pos_max (llama_context *ctx, llama_seq_id seq)
	Get maximum position in KV cache sequence.

void	seq_cp (llama_context *ctx, llama_seq_id src, llama_seq_id dst, llama_pos p0=0, llama_pos p1=-1)
	Copy KV cache from one sequence to another.

void	seq_keep (llama_context *ctx, llama_seq_id seq)
	Keep only one sequence, removing all others.

size_t	state_size (llama_context *ctx, llama_seq_id seq)
	Get size needed to serialize sequence state.

size_t	state_save (llama_context ctx, llama_seq_id seq, uint8_t dst, size_t size)
	Save sequence state to buffer.

size_t	state_load (llama_context ctx, llama_seq_id seq, const uint8_t src, size_t size)
	Restore sequence state from buffer.

size_t	global_state_size (llama_context *ctx)
	Get size needed to serialize global state.

size_t	global_state_save (llama_context ctx, uint8_t dst, size_t size)
	Save global state to buffer.

size_t	global_state_load (llama_context ctx, const uint8_t src, size_t size)
	Restore global state from buffer.

void	log_build_info (llama_context *ctx)
	Log KV cache build info and current state.

void	clear_all (llama_context *ctx)
	Clear all KV cache (complete reset)

void	clear_metadata (llama_context *ctx)
	Clear KV cache metadata only (fast reset)

void	clear_and_reseed (llama_context *ctx, const std::vector< llama_token > &original_sinks, const std::vector< llama_token > &tail, int32_t n_batch)

size_t	write_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath, const std::vector< llama_token > &tokens)
	Write KV state to file with self-describing format.

FileData	read_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath)

Variables
constexpr llama_seq_id	NO_LEASE = static_cast<llama_seq_id>(-1)
	Sentinel value indicating a branch has no KV residency.

Function Documentation

◆ clear_all()

void lloyal::kv::clear_all ( llama_context * ctx )

inline

Clear all KV cache (complete reset)

Clears both metadata and data buffers for a complete cache reset. Use when starting a new conversation or session.

Parameters

ctx	Llama context (must not be null)

Exceptions

std::runtime_error if ctx is null

Note

Uses llama_memory_clear(mem, true) which:

Clears metadata (cell positions, sequence heads)
Zeroes K/V tensor data buffers
Complete reset (slower than clear_metadata())

See also: clear_metadata() for faster metadata-only clearing

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 682 of file kv.hpp.

◆ clear_and_reseed()

void lloyal::kv::clear_and_reseed	(	llama_context *	ctx,
		const std::vector< llama_token > &	original_sinks,
		const std::vector< llama_token > &	tail,
		int32_t	n_batch
	)

inline

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 752 of file kv.hpp.

◆ clear_metadata()

void lloyal::kv::clear_metadata ( llama_context * ctx )

inline

Clear KV cache metadata only (fast reset)

Clears logical structure but keeps buffer allocations. Faster than clear_all() for compression patterns.

Parameters

ctx	Llama context (must not be null)

Exceptions

std::runtime_error if ctx is null

Note: Performance: Faster than clear_all() (no buffer zeroing) Use when immediately re-decoding; buffer reuse reduces overhead

See also: clear_all() for complete reset including data buffers

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 707 of file kv.hpp.

◆ global_state_load()

size_t lloyal::kv::global_state_load	(	llama_context *	ctx,
		const uint8_t *	src,
		size_t	size
	)

inline

Restore global state from buffer.

Deserializes and restores the entire context's state from buffer.

Parameters

ctx	Llama context (must not be null)
src	Source buffer (must not be null)
size	Buffer size in bytes

Returns: Bytes read, or 0 on failure

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 602 of file kv.hpp.

◆ global_state_save()

size_t lloyal::kv::global_state_save	(	llama_context *	ctx,
		uint8_t *	dst,
		size_t	size
	)

inline

Save global state to buffer.

Serializes the entire context's state into the provided buffer.

Parameters

ctx	Llama context (must not be null)
dst	Destination buffer (must not be null)
size	Buffer size in bytes

Returns: Bytes written, or 0 on failure

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 580 of file kv.hpp.

◆ global_state_size()

size_t lloyal::kv::global_state_size ( llama_context * ctx )

inline

Get size needed to serialize global state.

Returns buffer size required to save the entire context's state. Use when per-sequence serialization is not needed.

Parameters

ctx	Llama context (must not be null)

Returns: Required buffer size in bytes, or 0 if context is null

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 558 of file kv.hpp.

◆ log_build_info()

void lloyal::kv::log_build_info ( llama_context * ctx )

inline

Log KV cache build info and current state.

Outputs debug information about the KV cache configuration and current state. Useful for debugging and understanding cache behavior.

Parameters

ctx	Llama context (can be null; limits output if null)

Note: Only produces output when DEBUG logging is enabled

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 627 of file kv.hpp.

◆ pos_max()

llama_pos lloyal::kv::pos_max	(	llama_context *	ctx,
		llama_seq_id	seq
	)

inline

Get maximum position in KV cache sequence.

Returns the highest token position in the specified sequence's KV cache. For a sequence with N tokens, this returns N-1 (zero-indexed).

Parameters

ctx	Llama context (must not be null)
seq	Sequence ID

Returns: Maximum position (number of tokens - 1), or -1 if empty or context is null

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp, and /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 110 of file kv.hpp.

◆ read_file()

FileData lloyal::kv::read_file	(	llama_context *	ctx,
		llama_seq_id	seq,
		const std::string &	filepath
	)

inline

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 910 of file kv.hpp.

◆ remove_range()

bool lloyal::kv::remove_range	(	llama_context *	ctx,
		llama_seq_id	seq,
		llama_pos	p0,
		llama_pos	p1
	)

inline

Remove token range from KV cache sequence.

Removes tokens in the range [p0, p1) from the specified sequence's KV cache. Used for selective eviction in context window management.

Parameters

ctx	Llama context (must not be null)
seq	Sequence ID (use 0 for single-sequence mode)
p0	Start position (inclusive)
p1	End position (exclusive), use -1 for "to end"

Returns: true if successful, false if context is null or operation failed

Warning: CRITICAL: Call this BEFORE next llama_decode(), not after. Calling after decode may cause undefined behavior.

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp, and /home/runner/work/liblloyal/liblloyal/include/lloyal/logits.hpp.

Definition at line 77 of file kv.hpp.

◆ seq_cp()

void lloyal::kv::seq_cp	(	llama_context *	ctx,
		llama_seq_id	src,
		llama_seq_id	dst,
		llama_pos	p0 = `0`,
		llama_pos	p1 = `-1`
	)

inline

Copy KV cache from one sequence to another.

Copies KV cache state from source to destination sequence, enabling efficient branching without duplicating model weights.

Parameters

ctx	Llama context (must not be null)
src	Source sequence ID
dst	Destination sequence ID
p0	Start position (inclusive), default 0
p1	End position (exclusive), default -1 for "to end"

Note: Use case: Multi-sequence search (fork from trunk without copying model weights)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp, and /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 137 of file kv.hpp.

◆ seq_keep()

void lloyal::kv::seq_keep	(	llama_context *	ctx,
		llama_seq_id	seq
	)

inline

Keep only one sequence, removing all others.

Removes all sequences except the specified one from the KV cache. Efficient way to prune unused branches.

Parameters

ctx	Llama context (must not be null)
seq	Sequence ID to keep

Note: Use case: After selection, prune all alternatives except chosen path

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 161 of file kv.hpp.

◆ state_load()

size_t lloyal::kv::state_load	(	llama_context *	ctx,
		llama_seq_id	seq,
		const uint8_t *	src,
		size_t	size
	)

inline

Restore sequence state from buffer.

Deserializes KV cache state from buffer and restores it to the sequence. Automatically falls back to global state restore if per-sequence restore fails (may occur with fragmented caches).

Parameters

ctx	Llama context (must not be null)
seq	Sequence ID
src	Source buffer (must not be null)
size	Buffer size in bytes

Returns: Bytes read, or 0 on failure

Warning: May crash on recurrent models if KV cache is empty during load

Note: Fallback strategy: per-sequence → global state (handles fragmentation)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 503 of file kv.hpp.

◆ state_save()

size_t lloyal::kv::state_save	(	llama_context *	ctx,
		llama_seq_id	seq,
		uint8_t *	dst,
		size_t	size
	)

inline

Save sequence state to buffer.

Serializes the sequence's KV cache state into the provided buffer. Automatically falls back to global state save if per-sequence save fails (may occur with fragmented caches).

Parameters

ctx	Llama context (must not be null)
seq	Sequence ID
dst	Destination buffer (must not be null)
size	Buffer size in bytes

Returns: Bytes written, or 0 on failure

Note: Fallback strategy: per-sequence → global state (handles fragmentation)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 442 of file kv.hpp.

◆ state_size()

size_t lloyal::kv::state_size	(	llama_context *	ctx,
		llama_seq_id	seq
	)

inline

Get size needed to serialize sequence state.

Returns buffer size required to save the sequence's KV cache state. Automatically falls back to global state size if per-sequence query fails (may occur with fragmented caches).

Parameters

ctx	Llama context (must not be null)
seq	Sequence ID

Returns: Required buffer size in bytes, or 0 if empty/failed

Note: Fallback strategy: per-sequence → global state (handles fragmentation)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 387 of file kv.hpp.

◆ write_file()

size_t lloyal::kv::write_file	(	llama_context *	ctx,
		llama_seq_id	seq,
		const std::string &	filepath,
		const std::vector< llama_token > &	tokens
	)

inline

Write KV state to file with self-describing format.

Serializes KV cache state to file using llama.cpp's standard format:

Magic + Version (validation)
Token count + Token array
KV state data (cache + logits + embeddings)

Parameters

ctx	Llama context (must not be null)
seq	Sequence ID (use 0 for single-sequence mode)
filepath	Destination file path (must not be empty)
tokens	Token IDs to include in file

Returns: Bytes written, or 0 on failure

Note

Use cases:

Exit/resume: Save app state across restarts
Persistent sessions: Multiple save points per conversation
Context sharing: Serialize → upload → share

Warning: Skips write if KV cache is empty (returns 0)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/kv.hpp.

Definition at line 851 of file kv.hpp.

Namespaces

Classes

Functions

Variables

Function Documentation

◆ clear_all()

◆ clear_and_reseed()

◆ clear_metadata()

◆ global_state_load()

◆ global_state_save()

◆ global_state_size()

◆ log_build_info()

◆ pos_max()

◆ read_file()

◆ remove_range()

◆ seq_cp()

◆ seq_keep()

◆ state_load()

◆ state_save()

◆ state_size()

◆ write_file()