liblloyal 1.0.0
Branched Inference for llama.cpp
Loading...
Searching...
No Matches
lloyal::branch Namespace Reference

Classes

class  Branch
 
struct  BranchState
 Consolidated mutable state for a single branch. More...
 
class  BranchStore
 Handle table and batched decode orchestrator for branch management. More...
 
struct  CachedSamplingParams
 Concrete sampling params snapshot for memoization. More...
 
struct  DecodeEachItem
 Item for decode_each: one token per branch. More...
 
struct  DecodeScatterItem
 Item for decode_scatter: variable tokens per branch. More...
 
struct  GrammarEntry
 RAII entry for a grammar sampler in the registry. More...
 
struct  KvPressure
 Snapshot of KV cache pressure from BranchStore. More...
 
struct  SamplerChainEntry
 RAII entry for a sampler chain in the registry. More...
 

Typedefs

using BranchHandle = uint32_t
 Opaque handle to a branch slot.
 
using SamplerChainHandle = int32_t
 Handle to a sampler chain in BranchStore's registry (0 = invalid/none)
 
using GrammarHandle = int32_t
 Handle to a grammar sampler in BranchStore's registry (0 = invalid/none)
 
using MetricsHandle = int32_t
 Handle to a metrics tracker in BranchStore's registry (0 = invalid/none)
 

Functions

uint16_t handle_index (BranchHandle h)
 Extract slot index from a branch handle.
 
uint16_t handle_generation (BranchHandle h)
 Extract generation counter from a branch handle.
 
BranchHandle make_handle (uint16_t index, uint16_t generation)
 Construct a branch handle from index and generation.
 
template<SamplingParamsLike P>
CachedSamplingParams snapshot_params (const P &p)
 Snapshot sampling params for memoization comparison.
 
template<SamplingParamsLike P>
BranchHandle create (llama_context *ctx, const llama_model *model, BranchStore &s, llama_pos start_pos, const P &params, int n_batch=DEFAULT_N_BATCH, const char *grammar_str=nullptr, boundaries::BoundaryTracker *boundary_tracker=nullptr)
 Create a new branch with sampler chain, optional grammar, and metrics.
 
BranchHandle fork (BranchHandle source, BranchStore &s)
 Fork a branch into a new independent sequence.
 
void set_logit_bias (BranchHandle handle, const llama_logit_bias *biases, size_t n_biases, BranchStore &s)
 
void clear_logit_bias (BranchHandle handle, BranchStore &s)
 Clear all logit biases from a branch.
 
void set_steer (BranchHandle handle, std::function< void(llama_token_data_array &)> steer_fn, BranchStore &s)
 
void clear_steer (BranchHandle handle, BranchStore &s)
 Clear the steer callback from a branch.
 
template<SamplingParamsLike P>
void set_sampler_params (BranchHandle handle, const P &params, BranchStore &s)
 Replace a branch's sampler chain with new parameters.
 
void set_grammar (BranchHandle handle, const llama_model *model, const char *grammar_str, BranchStore &s)
 Replace a branch's grammar constraint.
 
void set_grammar_lazy (BranchHandle handle, const llama_model *model, const char *grammar_str, const std::vector< std::string > &trigger_patterns, const std::vector< llama_token > &trigger_tokens, BranchStore &s)
 Set lazy grammar on a branch (unconstrained until trigger fires)
 
void prune (BranchHandle handle, BranchStore &s)
 Prune a leaf branch (RESTRICT — throws if children exist)
 
void pruneSubtree (BranchHandle h, BranchStore &s)
 Prune a branch and all descendants (CASCADE — iterative post-order)
 
void force_snapshot_logits (BranchHandle handle, BranchStore &s)
 Force-copy the shared llama.cpp logits buffer into this branch's private snapshot.
 
void prefill (BranchHandle handle, const llama_token *tokens, size_t n_tokens, BranchStore &s)
 Decode multiple tokens and capture logits atomically (prompt prefill)
 
void step (BranchHandle handle, llama_token token, BranchStore &s)
 Decode a single token and capture logits (generation step)
 
const float * get_logits (BranchHandle handle, BranchStore &s)
 Get the branch's captured logits snapshot.
 
llama_token sample (BranchHandle handle, BranchStore &s)
 Sample a token from the branch's captured logits.
 
void accept_token (BranchHandle handle, llama_token token, BranchStore &s)
 Accept a sampled token, advancing grammar and sampler state.
 
void apply_grammar (BranchHandle handle, float *logits, int n_vocab, BranchStore &s)
 Apply grammar constraints to an external logits buffer.
 
std::vector< std::pair< llama_token, float > > get_legal_priors (BranchHandle handle, BranchStore &s)
 Get grammar-legal tokens with renormalized probabilities.
 
float get_legal_logsumexp (BranchHandle handle, BranchStore &s)
 Compute log-sum-exp over grammar-legal logits.
 
bool is_token_legal (BranchHandle handle, llama_token token, BranchStore &s)
 Check if a token is legal under grammar constraints.
 
float get_token_prior_assume_legal (BranchHandle handle, llama_token token, float logsumexp, BranchStore &s)
 Compute prior probability for a token known to be grammar-legal.
 
float get_token_prior (BranchHandle handle, llama_token token, float logsumexp, BranchStore &s)
 Compute prior probability for a token, checking grammar legality first.
 
llama_pos get_position (BranchHandle handle, BranchStore &s)
 Get the branch's current decode position.
 
llama_pos get_fork_head (BranchHandle handle, BranchStore &s)
 Get the branch's fork head (parent position at fork time)
 
float get_perplexity (BranchHandle handle, BranchStore &s)
 Get model-level perplexity (from raw logits)
 
float get_sampling_perplexity (BranchHandle handle, BranchStore &s)
 Get sampling-level perplexity (from filtered distribution)
 
float get_last_sampling_prior (BranchHandle handle, BranchStore &s)
 Get the last sampled token's prior from the filtered distribution.
 
int get_n_vocab (BranchHandle handle, BranchStore &s)
 Get the branch's vocabulary size.
 

Variables

constexpr BranchHandle INVALID_HANDLE = 0
 Null handle sentinel.
 
constexpr llama_seq_id NO_LEASE = kv::NO_LEASE
 Branch has no KV residency.
 
constexpr int DEFAULT_N_BATCH = 512
 Default batch size for decode operations.
 
constexpr uint32_t GEN_SHIFT = 16
 Bit shift for generation field.
 
constexpr uint32_t INDEX_MASK = 0xFFFF
 Mask for slot index field.
 

Typedef Documentation

◆ BranchHandle

using lloyal::branch::BranchHandle = typedef uint32_t

Opaque handle to a branch slot.

Encoded as (generation << 16) | index:

  • Upper 16 bits: generation counter (prevents ABA bugs on slot reuse)
  • Lower 16 bits: slot index (max 65535 branches)
  • Value 0 is reserved as the invalid/null handle
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 94 of file branch.hpp.

◆ GrammarHandle

using lloyal::branch::GrammarHandle = typedef int32_t

Handle to a grammar sampler in BranchStore's registry (0 = invalid/none)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 154 of file branch.hpp.

◆ MetricsHandle

using lloyal::branch::MetricsHandle = typedef int32_t

Handle to a metrics tracker in BranchStore's registry (0 = invalid/none)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 157 of file branch.hpp.

◆ SamplerChainHandle

using lloyal::branch::SamplerChainHandle = typedef int32_t

Handle to a sampler chain in BranchStore's registry (0 = invalid/none)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 151 of file branch.hpp.

Function Documentation

◆ accept_token()

void lloyal::branch::accept_token ( BranchHandle  handle,
llama_token  token,
BranchStore s 
)
inline

Accept a sampled token, advancing grammar and sampler state.

Updates:

  • Grammar parser state (if grammar is attached)
  • Sampler chain penalty tracking (repetition/frequency penalties)
  • Model-level perplexity (from raw logits, if available)
  • Sampling-level perplexity (from filtered candidate distribution)
Parameters
handleBranch that produced the token
tokenToken ID returned by sample()
sBranch store
Note
Safe to call with invalid handle (silent no-op).
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1913 of file branch.hpp.

◆ apply_grammar()

void lloyal::branch::apply_grammar ( BranchHandle  handle,
float *  logits,
int  n_vocab,
BranchStore s 
)
inline

Apply grammar constraints to an external logits buffer.

Sets logits of grammar-illegal tokens to -INFINITY in the provided buffer. Uses the branch's internal candidates_buffer as scratch space when vocab sizes match; allocates a temporary buffer otherwise.

Parameters
handleBranch with grammar to apply
logitsLogits buffer to modify in place (n_vocab floats)
n_vocabNumber of entries in the logits buffer
sBranch store
Note
No-op if handle is invalid or branch has no grammar attached.
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1981 of file branch.hpp.

◆ clear_logit_bias()

void lloyal::branch::clear_logit_bias ( BranchHandle  handle,
BranchStore s 
)
inline

Clear all logit biases from a branch.

Parameters
handleBranch to modify
sBranch store
Exceptions
std::runtime_errorif handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1421 of file branch.hpp.

◆ clear_steer()

void lloyal::branch::clear_steer ( BranchHandle  handle,
BranchStore s 
)
inline

Clear the steer callback from a branch.

Parameters
handleBranch to modify
sBranch store
Exceptions
std::runtime_errorif handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1485 of file branch.hpp.

◆ create()

template<SamplingParamsLike P>
BranchHandle lloyal::branch::create ( llama_context *  ctx,
const llama_model *  model,
BranchStore s,
llama_pos  start_pos,
const P &  params,
int  n_batch = DEFAULT_N_BATCH,
const char *  grammar_str = nullptr,
boundaries::BoundaryTracker boundary_tracker = nullptr 
)
inline

Create a new branch with sampler chain, optional grammar, and metrics.

Allocates a slot + KV lease from the store, initializes the sampler chain from params, optionally attaches a GBNF grammar and boundary tracker, and pre-allocates logits/candidates buffers sized to the model's vocabulary.

Template Parameters
PAny type satisfying the SamplingParamsLike concept
Parameters
ctxLlama context (not owned, must outlive branch)
modelLlama model (not owned, used for vocab size and sampler init)
sBranch store to allocate from
start_posStarting decode position (typically prompt length after prefill)
paramsSampling parameters (temperature, top_k, top_p, penalties, etc.)
n_batchBatch size for decode operations (default 512)
grammar_strGBNF grammar string, or nullptr for unconstrained generation
boundary_trackerBoundary detector (ownership transferred), or nullptr
Returns
Valid BranchHandle, or INVALID_HANDLE on failure
See also
prune() to free with KV cleanup, pruneSubtree() for CASCADE
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1225 of file branch.hpp.

◆ force_snapshot_logits()

void lloyal::branch::force_snapshot_logits ( BranchHandle  handle,
BranchStore s 
)
inline

Force-copy the shared llama.cpp logits buffer into this branch's private snapshot.

Copies the logits from the llama_context into the branch's internal buffer without performing a decode.

Warning
The shared buffer contains logits from the LAST llama_decode() call. If another branch (or batched decode) ran since this branch's last decode, the snapshot will contain wrong logits. Only call this immediately after a single-branch decode for this handle. Prefer prefill()/step() which capture atomically.
Parameters
handleBranch to capture logits for
sBranch store
Exceptions
std::runtime_errorif handle is invalid, vocab size is zero, or no logits are available (no prior decode with logits enabled)
Note
Sets has_logits = true, enabling sample() and get_logits().
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1675 of file branch.hpp.

◆ fork()

BranchHandle lloyal::branch::fork ( BranchHandle  source,
BranchStore s 
)
inline

Fork a branch into a new independent sequence.

Allocates a slot + KV lease, deep copies source state under the new seq_id. Records parent→child topology edge.

Cloned state:

  • KV cache (via kv::seq_cp)
  • Sampler chain (penalties, PRNG, filters)
  • Grammar (parser state)
  • Boundary tracker
  • Metrics (model + sampling perplexity)
  • Logits snapshot and logit bias

NOT cloned:

  • steer_fn (may capture references — call set_steer() on the child if needed)
Parameters
sourceHandle of the branch to fork from
sBranch store
Returns
Handle to the new child branch, or INVALID_HANDLE on failure
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1300 of file branch.hpp.

◆ get_fork_head()

llama_pos lloyal::branch::get_fork_head ( BranchHandle  handle,
BranchStore s 
)
inline

Get the branch's fork head (parent position at fork time)

Parameters
handleBranch handle
sBranch store
Returns
Fork head position (0 for root branches or invalid handles)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2291 of file branch.hpp.

◆ get_last_sampling_prior()

float lloyal::branch::get_last_sampling_prior ( BranchHandle  handle,
BranchStore s 
)
inline

Get the last sampled token's prior from the filtered distribution.

Returns P(token) from the post-filter sampling distribution. This is the correct prior for UCT-family algorithms since it matches what was actually sampled.

Parameters
handleBranch handle
sBranch store
Returns
Probability of last sampled token in [0, 1], or 0 if unavailable
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2346 of file branch.hpp.

◆ get_legal_logsumexp()

float lloyal::branch::get_legal_logsumexp ( BranchHandle  handle,
BranchStore s 
)
inline

Compute log-sum-exp over grammar-legal logits.

Returns log(sum(exp(logit_i))) over tokens that pass grammar constraints. Use for efficient per-token prior computation: P(token) = exp(logit[token] - logsumexp)

Numerically stable (max-subtraction trick).

Parameters
handleBranch with captured logits and optional grammar
sBranch store
Returns
Log-sum-exp value, or -INFINITY if no legal tokens or invalid state
See also
get_token_prior_assume_legal() for O(1) per-token prior using this value
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2116 of file branch.hpp.

◆ get_legal_priors()

std::vector< std::pair< llama_token, float > > lloyal::branch::get_legal_priors ( BranchHandle  handle,
BranchStore s 
)
inline

Get grammar-legal tokens with renormalized probabilities.

Returns (token, probability) pairs for tokens that pass grammar constraints. Probabilities are softmax-normalized over the legal set only (sum to 1.0).

Essential for policy priors in tree search: priors must only cover legal moves.

Parameters
handleBranch with captured logits and optional grammar
sBranch store
Returns
Vector of (token_id, probability) pairs, empty if no logits or no legal tokens
Note
If no grammar is attached, all tokens with finite logits are included.
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2037 of file branch.hpp.

◆ get_logits()

const float * lloyal::branch::get_logits ( BranchHandle  handle,
BranchStore s 
)
inline

Get the branch's captured logits snapshot.

Returns a pointer to the internal logits buffer (n_vocab floats). Only valid after force_snapshot_logits(), prefill(), or step().

Parameters
handleBranch to read logits from
sBranch store
Returns
Pointer to n_vocab floats, or nullptr if no logits captured
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1795 of file branch.hpp.

◆ get_n_vocab()

int lloyal::branch::get_n_vocab ( BranchHandle  handle,
BranchStore s 
)
inline

Get the branch's vocabulary size.

Parameters
handleBranch handle
sBranch store
Returns
Vocabulary size, or 0 if handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2383 of file branch.hpp.

◆ get_perplexity()

float lloyal::branch::get_perplexity ( BranchHandle  handle,
BranchStore s 
)
inline

Get model-level perplexity (from raw logits)

Returns perplexity computed from the full logit distribution before any sampler filtering. For the distribution actually sampled from, use get_sampling_perplexity().

Parameters
handleBranch handle
sBranch store
Returns
Model perplexity, or INFINITY if no tokens accepted
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2307 of file branch.hpp.

◆ get_position()

llama_pos lloyal::branch::get_position ( BranchHandle  handle,
BranchStore s 
)
inline

Get the branch's current decode position.

Parameters
handleBranch handle
sBranch store
Returns
Token position, or -1 if handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2279 of file branch.hpp.

◆ get_sampling_perplexity()

float lloyal::branch::get_sampling_perplexity ( BranchHandle  handle,
BranchStore s 
)
inline

Get sampling-level perplexity (from filtered distribution)

Returns perplexity from the distribution actually sampled from (after top-k/p/temp/penalties). Useful for policy priors and monitoring sampler chain impact.

Parameters
handleBranch handle
sBranch store
Returns
Sampling-level perplexity, or INFINITY if no tokens accepted
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2327 of file branch.hpp.

◆ get_token_prior()

float lloyal::branch::get_token_prior ( BranchHandle  handle,
llama_token  token,
float  logsumexp,
BranchStore s 
)
inline

Compute prior probability for a token, checking grammar legality first.

O(grammar_complexity) — uses is_token_legal() before computing the prior. Safe for ad-hoc callers who don't know whether the token is grammar-legal.

Parameters
handleBranch with captured logits and optional grammar
tokenToken ID to compute prior for
logsumexpPre-computed value from get_legal_logsumexp()
sBranch store
Returns
Probability in [0, 1], or 0 if token is illegal
Note
For search inner loops, prefer get_token_prior_assume_legal() since sample() already enforces grammar constraints.
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2259 of file branch.hpp.

◆ get_token_prior_assume_legal()

float lloyal::branch::get_token_prior_assume_legal ( BranchHandle  handle,
llama_token  token,
float  logsumexp,
BranchStore s 
)
inline

Compute prior probability for a token known to be grammar-legal.

O(1) operation — use in search inner loops where sample() already enforced grammar. Does NOT validate grammar legality; caller must ensure token is legal.

Parameters
handleBranch with captured logits
tokenToken ID (must be legal under grammar)
logsumexpPre-computed value from get_legal_logsumexp()
sBranch store
Returns
Probability in [0, 1], or 0 if state is invalid
See also
get_token_prior() for a safe version that checks grammar legality
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2228 of file branch.hpp.

◆ handle_generation()

uint16_t lloyal::branch::handle_generation ( BranchHandle  h)
inline

Extract generation counter from a branch handle.

Parameters
hBranch handle
Returns
Generation counter (upper 16 bits)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 134 of file branch.hpp.

◆ handle_index()

uint16_t lloyal::branch::handle_index ( BranchHandle  h)
inline

Extract slot index from a branch handle.

Parameters
hBranch handle
Returns
Slot index (lower 16 bits)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 125 of file branch.hpp.

◆ is_token_legal()

bool lloyal::branch::is_token_legal ( BranchHandle  handle,
llama_token  token,
BranchStore s 
)
inline

Check if a token is legal under grammar constraints.

Uses a 1-element candidate array for O(grammar_complexity) check instead of the O(n_vocab) full scan used by get_legal_priors().

Parameters
handleBranch with optional grammar
tokenToken ID to check
sBranch store
Returns
true if token is legal (or no grammar attached), false if illegal
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 2177 of file branch.hpp.

◆ make_handle()

BranchHandle lloyal::branch::make_handle ( uint16_t  index,
uint16_t  generation 
)
inline

Construct a branch handle from index and generation.

Parameters
indexSlot index (0–65535)
generationGeneration counter
Returns
Encoded branch handle
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 144 of file branch.hpp.

◆ prefill()

void lloyal::branch::prefill ( BranchHandle  handle,
const llama_token *  tokens,
size_t  n_tokens,
BranchStore s 
)
inline

Decode multiple tokens and capture logits atomically (prompt prefill)

Feeds tokens through the model in n_batch-sized chunks, advances the branch position, and snapshots logits. After this call, sample() and get_logits() are available.

Parameters
handleBranch to decode into
tokensArray of token IDs
n_tokensNumber of tokens in the array
sBranch store
Exceptions
std::runtime_errorif handle is invalid, decode fails, or logits capture fails
Note
For single-token decode, prefer step() (zero heap allocation).
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1711 of file branch.hpp.

◆ prune()

void lloyal::branch::prune ( BranchHandle  handle,
BranchStore s 
)
inline

Prune a leaf branch (RESTRICT — throws if children exist)

Evicts the KV lease and frees all resources via BranchStore::release(). If the branch has children, throws — use pruneSubtree() for CASCADE.

Parameters
handleBranch to prune (INVALID_HANDLE is a safe no-op)
sBranch store
Exceptions
std::runtime_errorif branch has children
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1627 of file branch.hpp.

◆ pruneSubtree()

void lloyal::branch::pruneSubtree ( BranchHandle  h,
BranchStore s 
)
inline

Prune a branch and all descendants (CASCADE — iterative post-order)

Traverses the subtree rooted at h, collecting all descendants, then prunes leaves-first so RESTRICT on prune() always passes.

Parameters
hRoot of subtree to prune
sBranch store
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1644 of file branch.hpp.

◆ sample()

llama_token lloyal::branch::sample ( BranchHandle  handle,
BranchStore s 
)
inline

Sample a token from the branch's captured logits.

Applies modifiers in order: Grammar → Logit Bias → Steer → Sampler Chain, then selects a token. Also records filtered candidates for metrics.

Requires prior force_snapshot_logits(), prefill(), or step().

Parameters
handleBranch to sample from
sBranch store
Returns
Sampled token ID, or -1 if no logits captured or sampling fails
Note
Call accept_token() after sampling to advance grammar and penalty state.
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1821 of file branch.hpp.

◆ set_grammar()

void lloyal::branch::set_grammar ( BranchHandle  handle,
const llama_model *  model,
const char *  grammar_str,
BranchStore s 
)
inline

Replace a branch's grammar constraint.

Frees the old grammar (if any) and attaches a new one. Pass nullptr or empty string to remove grammar constraints entirely.

Parameters
handleBranch to modify
modelLlama model (for vocab)
grammar_strGBNF grammar string, or nullptr to remove
sBranch store
Exceptions
std::runtime_errorif handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1552 of file branch.hpp.

◆ set_grammar_lazy()

void lloyal::branch::set_grammar_lazy ( BranchHandle  handle,
const llama_model *  model,
const char *  grammar_str,
const std::vector< std::string > &  trigger_patterns,
const std::vector< llama_token > &  trigger_tokens,
BranchStore s 
)
inline

Set lazy grammar on a branch (unconstrained until trigger fires)

Replaces any existing grammar. The lazy grammar accepts all tokens until a trigger pattern or token fires, then constrains subsequent generation.

Parameters
handleBranch handle
modelLlama model (for vocab)
grammar_strGBNF grammar string
trigger_patternsRegex patterns that activate the grammar
trigger_tokensToken IDs that activate the grammar
sBranch store
Exceptions
std::runtime_errorif handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1591 of file branch.hpp.

◆ set_logit_bias()

void lloyal::branch::set_logit_bias ( BranchHandle  handle,
const llama_logit_bias *  biases,
size_t  n_biases,
BranchStore s 
)
inline

◆ set_sampler_params()

template<SamplingParamsLike P>
void lloyal::branch::set_sampler_params ( BranchHandle  handle,
const P &  params,
BranchStore s 
)
inline

Replace a branch's sampler chain with new parameters.

Memoized: if the new params match the cached snapshot, this is a no-op. Otherwise frees the old chain and creates a new one.

Primary use case: Entropy-based Dynamic Temperature (EDT), where temperature changes per-token based on model uncertainty.

Template Parameters
PAny type satisfying the SamplingParamsLike concept
Parameters
handleBranch to modify
paramsNew sampling parameters
sBranch store
Exceptions
std::runtime_errorif handle is invalid
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1516 of file branch.hpp.

◆ set_steer()

void lloyal::branch::set_steer ( BranchHandle  handle,
std::function< void(llama_token_data_array &)>  steer_fn,
BranchStore s 
)
inline

◆ snapshot_params()

template<SamplingParamsLike P>
CachedSamplingParams lloyal::branch::snapshot_params ( const P &  p)
inline

Snapshot sampling params for memoization comparison.

Extracts resolved values from any SamplingParamsLike type using the same defaults as sampler::create_chain(), except seed defaults to 0 (not time) to avoid false cache misses from non-deterministic defaults.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 237 of file branch.hpp.

◆ step()

void lloyal::branch::step ( BranchHandle  handle,
llama_token  token,
BranchStore s 
)
inline

Decode a single token and capture logits (generation step)

Uses decode::one() which maintains a thread_local batch — heap-allocated once per thread, reused across calls. No per-call allocation.

Parameters
handleBranch to decode into
tokenToken ID to decode
sBranch store
Exceptions
std::runtime_errorif handle is invalid, decode fails, or logits capture fails
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 1756 of file branch.hpp.

Variable Documentation

◆ DEFAULT_N_BATCH

constexpr int lloyal::branch::DEFAULT_N_BATCH = 512
constexpr

Default batch size for decode operations.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 98 of file branch.hpp.

◆ GEN_SHIFT

constexpr uint32_t lloyal::branch::GEN_SHIFT = 16
constexpr

Bit shift for generation field.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 99 of file branch.hpp.

◆ INDEX_MASK

constexpr uint32_t lloyal::branch::INDEX_MASK = 0xFFFF
constexpr

Mask for slot index field.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 100 of file branch.hpp.

◆ INVALID_HANDLE

constexpr BranchHandle lloyal::branch::INVALID_HANDLE = 0
constexpr

Null handle sentinel.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 96 of file branch.hpp.

◆ NO_LEASE

constexpr llama_seq_id lloyal::branch::NO_LEASE = kv::NO_LEASE
constexpr

Branch has no KV residency.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 97 of file branch.hpp.