|
liblloyal 1.0.0
Composable primitives for llama.cpp inference
|
Classes | |
| struct | FileData |
| Data structure returned by read_file. More... | |
Functions | |
| bool | remove_range (llama_context *ctx, llama_seq_id seq, llama_pos p0, llama_pos p1) |
| Remove token range from KV cache sequence. | |
| llama_pos | pos_max (llama_context *ctx, llama_seq_id seq) |
| Get maximum position in KV cache sequence. | |
| void | seq_cp (llama_context *ctx, llama_seq_id src, llama_seq_id dst, llama_pos p0=0, llama_pos p1=-1) |
| Copy KV cache from one sequence to another. | |
| void | seq_keep (llama_context *ctx, llama_seq_id seq) |
| Keep only one sequence, removing all others. | |
| size_t | state_size (llama_context *ctx, llama_seq_id seq) |
| Get size needed to serialize sequence state. | |
| size_t | state_save (llama_context *ctx, llama_seq_id seq, uint8_t *dst, size_t size) |
| Save sequence state to buffer. | |
| size_t | state_load (llama_context *ctx, llama_seq_id seq, const uint8_t *src, size_t size) |
| Restore sequence state from buffer. | |
| size_t | global_state_size (llama_context *ctx) |
| Get size needed to serialize global state. | |
| size_t | global_state_save (llama_context *ctx, uint8_t *dst, size_t size) |
| Save global state to buffer. | |
| size_t | global_state_load (llama_context *ctx, const uint8_t *src, size_t size) |
| Restore global state from buffer. | |
| void | log_build_info (llama_context *ctx) |
| Log KV cache build info and current state. | |
| void | clear_all (llama_context *ctx) |
| Clear all KV cache (complete reset) | |
| void | clear_metadata (llama_context *ctx) |
| Clear KV cache metadata only (fast reset) | |
| void | clear_and_reseed (llama_context *ctx, const std::vector< llama_token > &original_sinks, const std::vector< llama_token > &tail, int32_t n_batch) |
| size_t | write_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath, const std::vector< llama_token > &tokens) |
| Write KV state to file with self-describing format. | |
| FileData | read_file (llama_context *ctx, llama_seq_id seq, const std::string &filepath) |
|
inline |
Clear all KV cache (complete reset)
Clears both metadata and data buffers for a complete cache reset. Use when starting a new conversation or session.
| ctx | Llama context (must not be null) |
| std::runtime_error | if ctx is null |
|
inline |
|
inline |
Clear KV cache metadata only (fast reset)
Clears logical structure but keeps buffer allocations. Faster than clear_all() for compression patterns.
| ctx | Llama context (must not be null) |
| std::runtime_error | if ctx is null |
|
inline |
Restore global state from buffer.
Deserializes and restores the entire context's state from buffer.
| ctx | Llama context (must not be null) |
| src | Source buffer (must not be null) |
| size | Buffer size in bytes |
|
inline |
Save global state to buffer.
Serializes the entire context's state into the provided buffer.
| ctx | Llama context (must not be null) |
| dst | Destination buffer (must not be null) |
| size | Buffer size in bytes |
|
inline |
Get size needed to serialize global state.
Returns buffer size required to save the entire context's state. Use when per-sequence serialization is not needed.
| ctx | Llama context (must not be null) |
|
inline |
Log KV cache build info and current state.
Outputs debug information about the KV cache configuration and current state. Useful for debugging and understanding cache behavior.
| ctx | Llama context (can be null; limits output if null) |
|
inline |
Get maximum position in KV cache sequence.
Returns the highest token position in the specified sequence's KV cache. For a sequence with N tokens, this returns N-1 (zero-indexed).
| ctx | Llama context (must not be null) |
| seq | Sequence ID |
|
inline |
|
inline |
Remove token range from KV cache sequence.
Removes tokens in the range [p0, p1) from the specified sequence's KV cache. Used for selective eviction in context window management.
| ctx | Llama context (must not be null) |
| seq | Sequence ID (use 0 for single-sequence mode) |
| p0 | Start position (inclusive) |
| p1 | End position (exclusive), use -1 for "to end" |
|
inline |
Copy KV cache from one sequence to another.
Copies KV cache state from source to destination sequence, enabling efficient branching without duplicating model weights.
| ctx | Llama context (must not be null) |
| src | Source sequence ID |
| dst | Destination sequence ID |
| p0 | Start position (inclusive), default 0 |
| p1 | End position (exclusive), default -1 for "to end" |
|
inline |
Keep only one sequence, removing all others.
Removes all sequences except the specified one from the KV cache. Efficient way to prune unused branches.
| ctx | Llama context (must not be null) |
| seq | Sequence ID to keep |
|
inline |
Restore sequence state from buffer.
Deserializes KV cache state from buffer and restores it to the sequence. Automatically falls back to global state restore if per-sequence restore fails (may occur with fragmented caches).
| ctx | Llama context (must not be null) |
| seq | Sequence ID |
| src | Source buffer (must not be null) |
| size | Buffer size in bytes |
|
inline |
Save sequence state to buffer.
Serializes the sequence's KV cache state into the provided buffer. Automatically falls back to global state save if per-sequence save fails (may occur with fragmented caches).
| ctx | Llama context (must not be null) |
| seq | Sequence ID |
| dst | Destination buffer (must not be null) |
| size | Buffer size in bytes |
|
inline |
Get size needed to serialize sequence state.
Returns buffer size required to save the sequence's KV cache state. Automatically falls back to global state size if per-sequence query fails (may occur with fragmented caches).
| ctx | Llama context (must not be null) |
| seq | Sequence ID |
|
inline |
Write KV state to file with self-describing format.
Serializes KV cache state to file using llama.cpp's standard format:
| ctx | Llama context (must not be null) |
| seq | Sequence ID (use 0 for single-sequence mode) |
| filepath | Destination file path (must not be empty) |
| tokens | Token IDs to include in file |