liblloyal 1.0.0
Branched Inference for llama.cpp
Loading...
Searching...
No Matches
lloyal::branch::BranchStore Class Reference

Handle table and batched decode orchestrator for branch management. More...

#include <lloyal/branch.hpp>

Classes

struct  Allocation
 Result of allocate(): a slot handle + its leased seq_id. More...
 

Public Member Functions

 BranchStore (size_t initial_capacity=16)
 Construct a branch store with initial slot capacity.
 
 ~BranchStore ()
 Destructor — frees CPU resources.
 
Allocation allocate ()
 Allocate a branch slot + KV lease atomically.
 
void release (BranchHandle handle)
 Release a branch slot + evict its KV lease.
 
void init_tenancy (llama_context *ctx)
 Initialize KV tenancy after context creation.
 
void drain ()
 Explicit teardown — evict all leases while context is alive.
 
void retainOnly (BranchHandle winner)
 Keep only the winner — nuclear KV + CPU cleanup.
 
size_t available () const
 Number of vacant seq_ids available for acquisition.
 
KvPressure kv_pressure () const
 KV cache pressure snapshot — O(1), no tree walking.
 
void add_cells_used (uint32_t n)
 Increment cells_used counter (for standalone prefill/step outside BranchStore methods)
 
BranchHandle parent (BranchHandle h) const
 Get a branch's parent handle.
 
llama_pos fork_head (BranchHandle h) const
 Get a branch's fork head (parent position at fork time)
 
const std::vector< BranchHandle > & children (BranchHandle h) const
 Get a branch's child handles.
 
bool isLeaf (BranchHandle h) const
 Test whether a branch is a leaf (no children)
 
bool isActive (BranchHandle h) const
 Test whether a branch holds a KV lease.
 
BranchStateget (BranchHandle handle)
 Look up branch state by handle.
 
const BranchStateget (BranchHandle handle) const
 Look up branch state by handle.
 
template<SamplingParamsLike P>
SamplerChainHandle create_sampler (const P &params)
 Create a sampler chain and register it.
 
SamplerChainHandle clone_sampler (SamplerChainHandle h)
 Clone a sampler chain (for fork)
 
void free_sampler (SamplerChainHandle h)
 Free a sampler chain.
 
llama_sampler * get_sampler_chain (SamplerChainHandle h) const
 Dereference a sampler chain handle (non-owning)
 
bool sampler_has_dist (SamplerChainHandle h) const
 Check if a sampler chain ends with dist (stochastic) or greedy.
 
GrammarHandle create_grammar (const llama_model *model, const char *grammar_str, const char *root="root")
 Create a grammar sampler and register it.
 
GrammarHandle create_grammar_lazy (const llama_model *model, const char *grammar_str, const std::vector< std::string > &trigger_patterns, const std::vector< llama_token > &trigger_tokens, const char *root="root")
 Create a lazy grammar (unconstrained until trigger fires)
 
GrammarHandle clone_grammar (GrammarHandle h)
 Clone a grammar (for fork)
 
void free_grammar (GrammarHandle h)
 Free a grammar.
 
llama_sampler * get_grammar_sampler (GrammarHandle h) const
 Dereference a grammar handle (non-owning)
 
MetricsHandle create_metrics ()
 Create a metrics tracker and register it.
 
MetricsHandle clone_metrics (MetricsHandle h)
 Clone a metrics tracker (for fork)
 
void free_metrics (MetricsHandle h)
 Free a metrics tracker.
 
void add_model_surprisal (MetricsHandle h, float surprisal)
 Add model-level surprisal to a metrics tracker.
 
void add_sampling_surprisal (MetricsHandle h, float surprisal)
 Add sampling-level surprisal to a metrics tracker.
 
float get_model_ppl (MetricsHandle h) const
 Get model-level perplexity from a metrics tracker.
 
float get_sampling_ppl (MetricsHandle h) const
 Get sampling-level perplexity from a metrics tracker.
 
void decode_each (std::span< const DecodeEachItem > items)
 Decode one token per branch in a single GPU dispatch.
 
void decode_scatter (std::span< const DecodeScatterItem > items)
 Decode variable token counts per branch with auto-chunking.
 

Detailed Description

Handle table and batched decode orchestrator for branch management.

Provides two concerns:

Slot management — A pool of BranchState slots addressed by opaque handles with generation counters for ABA prevention. Slot 0 is permanently reserved (handle 0 = INVALID_HANDLE). Auto-grows by doubling up to 65535 slots. Methods: allocate(), release(), get().

Batched decode — Orchestrates multi-branch GPU dispatches that amortize llama_decode() overhead across N branches. Each method validates handles, builds the appropriate decode primitive's input, dispatches, captures logits into per-branch snapshots, and advances positions atomically. Methods: decode_each(), decode_scatter().

Batched decode methods vs free-function decode:

Method Tokens/branch Chunking Logit capture
decode_each() 1 No (1 call) Per-branch
decode_scatter() Variable Auto Per-branch
branch::step() 1 No Single branch
Warning
n_seq_max constraint: Each live branch consumes one KV cache sequence ID (llama_seq_id) managed by kv::tenancy. Call init_tenancy(ctx) after context creation to set the ceiling. The hard limit is llama_n_seq_max(ctx) (typically 256). allocate() acquires both a slot and a lease atomically; release()/drain() return both resources symmetrically.
Thread safety: External synchronization required (caller's mutex). Typically SessionContext holds _decodeMutex for decode operations.
See also
branch::create() to initialize a branch in this store
branch::fork() to clone a branch into a new sequence
branch::prune() / branch::destroy() for teardown

Definition at line 392 of file branch.hpp.

Constructor & Destructor Documentation

◆ BranchStore()

lloyal::branch::BranchStore::BranchStore ( size_t  initial_capacity = 16)
inlineexplicit

Construct a branch store with initial slot capacity.

Parameters
initial_capacityNumber of slots to pre-allocate (minimum 2)

Definition at line 398 of file branch.hpp.

◆ ~BranchStore()

lloyal::branch::BranchStore::~BranchStore ( )
inline

Destructor — frees CPU resources.

drain() must be called first while the llama_context is still alive.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 417 of file branch.hpp.

Member Function Documentation

◆ add_cells_used()

void lloyal::branch::BranchStore::add_cells_used ( uint32_t  n)
inline

Increment cells_used counter (for standalone prefill/step outside BranchStore methods)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 578 of file branch.hpp.

◆ add_model_surprisal()

void lloyal::branch::BranchStore::add_model_surprisal ( MetricsHandle  h,
float  surprisal 
)
inline

Add model-level surprisal to a metrics tracker.

Parameters
hMetrics handle
surprisalSurprisal in nats
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 846 of file branch.hpp.

◆ add_sampling_surprisal()

void lloyal::branch::BranchStore::add_sampling_surprisal ( MetricsHandle  h,
float  surprisal 
)
inline

Add sampling-level surprisal to a metrics tracker.

Parameters
hMetrics handle
surprisalSurprisal in nats
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 860 of file branch.hpp.

◆ allocate()

Allocation lloyal::branch::BranchStore::allocate ( )
inline

Allocate a branch slot + KV lease atomically.

Acquires a seq_id from tenancy, then a slot from the freelist. If either fails, both are rolled back cleanly.

Returns
{handle, seq_id}, or {INVALID_HANDLE, -1} if exhausted
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 439 of file branch.hpp.

◆ available()

size_t lloyal::branch::BranchStore::available ( ) const
inline

Number of vacant seq_ids available for acquisition.

Returns
Count of seq_ids in the tenancy vacant pool
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 561 of file branch.hpp.

◆ children()

const std::vector< BranchHandle > & lloyal::branch::BranchStore::children ( BranchHandle  h) const
inline

Get a branch's child handles.

Parameters
hBranch handle
Returns
Reference to child handle vector (empty if leaf or invalid)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 605 of file branch.hpp.

◆ clone_grammar()

GrammarHandle lloyal::branch::BranchStore::clone_grammar ( GrammarHandle  h)
inline

Clone a grammar (for fork)

Parameters
hSource grammar handle
Returns
New handle with cloned grammar, or 0 if source is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 777 of file branch.hpp.

◆ clone_metrics()

MetricsHandle lloyal::branch::BranchStore::clone_metrics ( MetricsHandle  h)
inline

Clone a metrics tracker (for fork)

Parameters
hSource metrics handle
Returns
New handle with cloned state, or 0 if source is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 824 of file branch.hpp.

◆ clone_sampler()

SamplerChainHandle lloyal::branch::BranchStore::clone_sampler ( SamplerChainHandle  h)
inline

Clone a sampler chain (for fork)

Parameters
hSource sampler chain handle
Returns
New handle with cloned chain, or 0 if source is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 687 of file branch.hpp.

◆ create_grammar()

GrammarHandle lloyal::branch::BranchStore::create_grammar ( const llama_model *  model,
const char *  grammar_str,
const char *  root = "root" 
)
inline

Create a grammar sampler and register it.

Parameters
modelLlama model (for vocab)
grammar_strGBNF grammar string
rootRoot rule name (default "root")
Returns
Handle to the new grammar (never 0)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 738 of file branch.hpp.

◆ create_grammar_lazy()

GrammarHandle lloyal::branch::BranchStore::create_grammar_lazy ( const llama_model *  model,
const char *  grammar_str,
const std::vector< std::string > &  trigger_patterns,
const std::vector< llama_token > &  trigger_tokens,
const char *  root = "root" 
)
inline

Create a lazy grammar (unconstrained until trigger fires)

Parameters
modelLlama model (for vocab)
grammar_strGBNF grammar string
trigger_patternsRegex patterns that activate the grammar
trigger_tokensToken IDs that activate the grammar
rootRoot rule name (default "root")
Returns
Handle to the new grammar, or 0 on failure
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 757 of file branch.hpp.

◆ create_metrics()

MetricsHandle lloyal::branch::BranchStore::create_metrics ( )
inline

Create a metrics tracker and register it.

Returns
Handle to the new tracker (never 0)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 813 of file branch.hpp.

◆ create_sampler()

template<SamplingParamsLike P>
SamplerChainHandle lloyal::branch::BranchStore::create_sampler ( const P &  params)
inline

Create a sampler chain and register it.

Parameters
paramsSampling parameters (any SamplingParamsLike type)
Returns
Handle to the new sampler chain (never 0)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 672 of file branch.hpp.

◆ decode_each()

void lloyal::branch::BranchStore::decode_each ( std::span< const DecodeEachItem items)
inline

Decode one token per branch in a single GPU dispatch.

Packs N tokens (one per branch) into a single llama_batch and calls decode::each(), amortizing GPU dispatch overhead across all branches. After decode, captures logits from the batch into each branch's logits_snapshot and advances each branch's position by 1.

Note
Batch index mapping: item[i] in the batch corresponds to llama_get_logits_ith(ctx, i). This is a 1:1 mapping because decode::each places exactly one token per batch slot.
Uses an internal scratch buffer. Since BranchStore requires external synchronization (caller's mutex), no concurrent access is possible.
Parameters
itemsSpan of {handle, token} pairs (all handles must be valid)
Exceptions
std::runtime_errorif any handle is invalid, contexts don't match, or decode fails
See also
decode::each() for the underlying single-batch primitive
decode_scatter() for variable token counts per branch
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 921 of file branch.hpp.

◆ decode_scatter()

void lloyal::branch::BranchStore::decode_scatter ( std::span< const DecodeScatterItem items)
inline

Decode variable token counts per branch with auto-chunking.

Two-pass algorithm:

Pass 1 — Build chunks: Greedily bin-packs items into chunks up to llama_n_batch(ctx) tokens. Oversized items (tokens.size() > n_batch) get their own chunk and are dispatched via decode::many(). Zero-length items are silently skipped.

Pass 2 — Dispatch: Iterates chunks, dispatching normal chunks via decode::scatter() and oversized chunks via decode::many(). Captures logits into per-branch snapshots and advances positions.

Note
Uses an internal scratch buffer. Since BranchStore requires external synchronization (caller's mutex), no concurrent access is possible.
Parameters
itemsSpan of {handle, tokens} pairs (all handles must be valid)
Exceptions
std::runtime_errorif any handle is invalid, contexts don't match, or decode fails
See also
decode::scatter() for the underlying single-batch primitive
decode::many() for the oversized-item fallback
decode_each() for the simpler one-token-per-branch variant
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 993 of file branch.hpp.

◆ drain()

void lloyal::branch::BranchStore::drain ( )
inline

Explicit teardown — evict all leases while context is alive.

Must be called before llama_free(ctx). Idempotent. Terminal — BranchStore is not reusable after drain(). freelist_ is not repopulated; call init_tenancy() on a fresh store if you need a new cycle. After drain(), allocate() returns {INVALID_HANDLE, NO_LEASE}.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 512 of file branch.hpp.

◆ fork_head()

llama_pos lloyal::branch::BranchStore::fork_head ( BranchHandle  h) const
inline

Get a branch's fork head (parent position at fork time)

Parameters
hBranch handle
Returns
Fork head position (0 for root branches or invalid handles)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 595 of file branch.hpp.

◆ free_grammar()

void lloyal::branch::BranchStore::free_grammar ( GrammarHandle  h)
inline

Free a grammar.

Parameters
hHandle to free (0 is a safe no-op)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 792 of file branch.hpp.

◆ free_metrics()

void lloyal::branch::BranchStore::free_metrics ( MetricsHandle  h)
inline

Free a metrics tracker.

Parameters
hHandle to free (0 is a safe no-op)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 837 of file branch.hpp.

◆ free_sampler()

void lloyal::branch::BranchStore::free_sampler ( SamplerChainHandle  h)
inline

Free a sampler chain.

Parameters
hHandle to free (0 is a safe no-op)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 703 of file branch.hpp.

◆ get() [1/2]

BranchState * lloyal::branch::BranchStore::get ( BranchHandle  handle)
inline

Look up branch state by handle.

Validates the handle's index, generation, and in-use flag. Slot 0 always returns nullptr (reserved for INVALID_HANDLE).

Parameters
handleBranch handle to look up
Returns
Pointer to BranchState, or nullptr if handle is invalid/stale
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 640 of file branch.hpp.

◆ get() [2/2]

const BranchState * lloyal::branch::BranchStore::get ( BranchHandle  handle) const
inline

Look up branch state by handle.

Validates the handle's index, generation, and in-use flag. Slot 0 always returns nullptr (reserved for INVALID_HANDLE).

Parameters
handleBranch handle to look up
Returns
Pointer to BranchState, or nullptr if handle is invalid/stale

Definition at line 660 of file branch.hpp.

◆ get_grammar_sampler()

llama_sampler * lloyal::branch::BranchStore::get_grammar_sampler ( GrammarHandle  h) const
inline

Dereference a grammar handle (non-owning)

Parameters
hGrammar handle
Returns
Pointer to the grammar sampler, or nullptr if invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 801 of file branch.hpp.

◆ get_model_ppl()

float lloyal::branch::BranchStore::get_model_ppl ( MetricsHandle  h) const
inline

Get model-level perplexity from a metrics tracker.

Parameters
hMetrics handle
Returns
exp(average surprisal), INFINITY if no samples or invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 874 of file branch.hpp.

◆ get_sampler_chain()

llama_sampler * lloyal::branch::BranchStore::get_sampler_chain ( SamplerChainHandle  h) const
inline

Dereference a sampler chain handle (non-owning)

Parameters
hSampler chain handle
Returns
Pointer to the chain, or nullptr if invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 712 of file branch.hpp.

◆ get_sampling_ppl()

float lloyal::branch::BranchStore::get_sampling_ppl ( MetricsHandle  h) const
inline

Get sampling-level perplexity from a metrics tracker.

Parameters
hMetrics handle
Returns
exp(average surprisal), INFINITY if no samples or invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 888 of file branch.hpp.

◆ init_tenancy()

void lloyal::branch::BranchStore::init_tenancy ( llama_context *  ctx)
inline

Initialize KV tenancy after context creation.

Parameters
ctxLlama context (must outlive BranchStore or call drain() first)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 499 of file branch.hpp.

◆ isActive()

bool lloyal::branch::BranchStore::isActive ( BranchHandle  h) const
inline

Test whether a branch holds a KV lease.

Parameters
hBranch handle
Returns
true if seq_id != NO_LEASE, false if inactive or handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 626 of file branch.hpp.

◆ isLeaf()

bool lloyal::branch::BranchStore::isLeaf ( BranchHandle  h) const
inline

Test whether a branch is a leaf (no children)

Parameters
hBranch handle
Returns
true if branch has no children or handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 616 of file branch.hpp.

◆ kv_pressure()

KvPressure lloyal::branch::BranchStore::kv_pressure ( ) const
inline

KV cache pressure snapshot — O(1), no tree walking.

cells_used tracks unique KV cells per branch. Incremented on decode_each/decode_scatter, decremented on release (position - fork_head), reset on drain/retainOnly/init_tenancy.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 570 of file branch.hpp.

◆ parent()

BranchHandle lloyal::branch::BranchStore::parent ( BranchHandle  h) const
inline

Get a branch's parent handle.

Parameters
hBranch handle
Returns
Parent handle, or INVALID_HANDLE if root or handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 585 of file branch.hpp.

◆ release()

void lloyal::branch::BranchStore::release ( BranchHandle  handle)
inline

Release a branch slot + evict its KV lease.

Removes parent→child edge, evicts the seq_id (stripping KV tags), frees CPU resources, and returns the slot to the freelist.

Parameters
handleBranch handle to release (INVALID_HANDLE is a safe no-op)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 463 of file branch.hpp.

◆ retainOnly()

void lloyal::branch::BranchStore::retainOnly ( BranchHandle  winner)
inline

Keep only the winner — nuclear KV + CPU cleanup.

Calls seq_keep(winner_seq) for a single KV pass, then releases all other slots (CPU only — KV already stripped by seq_keep).

Parameters
winnerHandle to the branch to retain (must be valid + leased)
Exceptions
std::runtime_errorif winner is invalid or has no lease
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 534 of file branch.hpp.

◆ sampler_has_dist()

bool lloyal::branch::BranchStore::sampler_has_dist ( SamplerChainHandle  h) const
inline

Check if a sampler chain ends with dist (stochastic) or greedy.

Parameters
hSampler chain handle
Returns
true if chain has dist sampler, false if greedy or invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 723 of file branch.hpp.


The documentation for this class was generated from the following file: