liblloyal 1.0.0
Branched Inference for llama.cpp
Loading...
Searching...
No Matches
lloyal::branch::BranchState Struct Reference

Consolidated mutable state for a single branch. More...

#include <lloyal/branch.hpp>

Public Attributes

llama_context * ctx = nullptr
 Llama context (not owned, must outlive branch)
 
const llama_model * model = nullptr
 Llama model (not owned, must outlive branch)
 
llama_seq_id seq_id = NO_LEASE
 KV cache sequence identifier (NO_LEASE when inactive)
 
llama_pos position = 0
 Current decode position in the sequence.
 
llama_pos fork_head = 0
 Parent's position at fork time (0 for root branches)
 
SamplerChainHandle sampler_chain = 0
 Handle into BranchStore's sampler registry.
 
GrammarHandle grammar = 0
 Handle into BranchStore's grammar registry.
 
CachedSamplingParams cached_params
 Params used to create current chain (for memoization)
 
boundaries::BoundaryTrackerboundary_tracker = nullptr
 Token boundary detector (owned, optional)
 
std::vector< llama_logit_bias > logit_bias
 Static token biases, cloned on fork.
 
std::function< void(llama_token_data_array &)> steer_fn
 Dynamic logit callback, NOT cloned on fork.
 
MetricsHandle metrics = 0
 Handle into BranchStore's metrics registry.
 
llama_token last_token = -1
 Last token returned by sample()
 
std::vector< llama_token_data > last_candidates
 Filtered candidates from last sample()
 
std::vector< float > logits_snapshot
 Captured logit distribution (n_vocab floats)
 
bool has_logits = false
 True only after force_snapshot_logits(), prefill(), or step()
 
std::vector< llama_token_data > candidates_buffer
 Reusable scratch buffer for sampling (avoids O(n_vocab) allocs per sample call).
 
int n_batch = DEFAULT_N_BATCH
 Batch size for decode operations.
 
int n_vocab = 0
 Vocabulary size (cached for buffer pre-allocation)
 
uint16_t generation = 0
 Slot generation counter (for ABA prevention)
 
bool in_use = false
 True when slot is allocated to an active branch.
 
BranchHandle parent = INVALID_HANDLE
 Parent branch (INVALID_HANDLE if root)
 
std::vector< BranchHandlechildren
 Child branches forked from this one.
 

Detailed Description

Consolidated mutable state for a single branch.

Each branch encapsulates all state needed for independent generation:

Forkable state (cloned by fork()):

  • KV cache sequence (via llama_memory_seq_cp)
  • Sampler chain (handle into BranchStore registry)
  • Grammar constraints (handle into BranchStore registry)
  • Boundary tracker (token boundary detection)
  • Metrics (handle into BranchStore registry)
  • Logits snapshot (captured distribution for deferred sampling)
  • Logit bias (static token-level adjustments)

Non-forkable state (NOT cloned by fork()):

  • steer_fn (dynamic callback — may capture references, unsafe to copy)

Sampling application order in sample(): Grammar → Logit Bias → Steer → Sampler Chain

Definition at line 275 of file branch.hpp.

Member Data Documentation

◆ boundary_tracker

boundaries::BoundaryTracker* lloyal::branch::BranchState::boundary_tracker = nullptr

Token boundary detector (owned, optional)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 288 of file branch.hpp.

◆ cached_params

CachedSamplingParams lloyal::branch::BranchState::cached_params

Params used to create current chain (for memoization)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 286 of file branch.hpp.

◆ candidates_buffer

std::vector<llama_token_data> lloyal::branch::BranchState::candidates_buffer

Reusable scratch buffer for sampling (avoids O(n_vocab) allocs per sample call).

Memory footprint per branch for large vocab models (128k tokens):

  • logits_snapshot: n_vocab * 4 bytes (~512KB)
  • candidates_buffer: n_vocab * sizeof(llama_token_data) (~1.5–2MB)
  • last_candidates: typically ~40 entries with top_k=40 (~480 bytes)

    Todo:
    Move to per-thread or SessionContext scratch arena for deep search trees.
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 309 of file branch.hpp.

◆ children

std::vector<BranchHandle> lloyal::branch::BranchState::children

Child branches forked from this one.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 319 of file branch.hpp.

◆ ctx

llama_context* lloyal::branch::BranchState::ctx = nullptr

Llama context (not owned, must outlive branch)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 276 of file branch.hpp.

◆ fork_head

llama_pos lloyal::branch::BranchState::fork_head = 0

Parent's position at fork time (0 for root branches)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 281 of file branch.hpp.

◆ generation

uint16_t lloyal::branch::BranchState::generation = 0

Slot generation counter (for ABA prevention)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 314 of file branch.hpp.

◆ grammar

GrammarHandle lloyal::branch::BranchState::grammar = 0

Handle into BranchStore's grammar registry.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 284 of file branch.hpp.

◆ has_logits

bool lloyal::branch::BranchState::has_logits = false

◆ in_use

bool lloyal::branch::BranchState::in_use = false

True when slot is allocated to an active branch.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 315 of file branch.hpp.

◆ last_candidates

std::vector<llama_token_data> lloyal::branch::BranchState::last_candidates

Filtered candidates from last sample()

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 296 of file branch.hpp.

◆ last_token

llama_token lloyal::branch::BranchState::last_token = -1

Last token returned by sample()

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 295 of file branch.hpp.

◆ logit_bias

std::vector<llama_logit_bias> lloyal::branch::BranchState::logit_bias

Static token biases, cloned on fork.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 290 of file branch.hpp.

◆ logits_snapshot

std::vector<float> lloyal::branch::BranchState::logits_snapshot

Captured logit distribution (n_vocab floats)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 298 of file branch.hpp.

◆ metrics

MetricsHandle lloyal::branch::BranchState::metrics = 0

Handle into BranchStore's metrics registry.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 293 of file branch.hpp.

◆ model

const llama_model* lloyal::branch::BranchState::model = nullptr

Llama model (not owned, must outlive branch)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 277 of file branch.hpp.

◆ n_batch

int lloyal::branch::BranchState::n_batch = DEFAULT_N_BATCH

Batch size for decode operations.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 311 of file branch.hpp.

◆ n_vocab

int lloyal::branch::BranchState::n_vocab = 0

Vocabulary size (cached for buffer pre-allocation)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 312 of file branch.hpp.

◆ parent

BranchHandle lloyal::branch::BranchState::parent = INVALID_HANDLE

Parent branch (INVALID_HANDLE if root)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 318 of file branch.hpp.

◆ position

llama_pos lloyal::branch::BranchState::position = 0

Current decode position in the sequence.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 280 of file branch.hpp.

◆ sampler_chain

SamplerChainHandle lloyal::branch::BranchState::sampler_chain = 0

Handle into BranchStore's sampler registry.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 283 of file branch.hpp.

◆ seq_id

llama_seq_id lloyal::branch::BranchState::seq_id = NO_LEASE

KV cache sequence identifier (NO_LEASE when inactive)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 279 of file branch.hpp.

◆ steer_fn

std::function<void(llama_token_data_array&)> lloyal::branch::BranchState::steer_fn

Dynamic logit callback, NOT cloned on fork.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp.

Definition at line 291 of file branch.hpp.


The documentation for this struct was generated from the following file: