|
liblloyal 1.0.0
Branched Inference for llama.cpp
|
Branch Primitive for Tree Search and Multi-Sequence Generation. More...
#include "boundaries.hpp"#include "common.hpp"#include "decode.hpp"#include "grammar.hpp"#include "kv.hpp"#include "logits.hpp"#include "metrics.hpp"#include "sampler.hpp"#include <llama/llama.h>#include <algorithm>#include <cassert>#include <cmath>#include <cstdint>#include <cstring>#include <ctime>#include <deque>#include <functional>#include <limits>#include <mutex>#include <span>#include <stdexcept>#include <string>#include <utility>#include <vector>Go to the source code of this file.
Classes | |
| struct | lloyal::branch::KvPressure |
| Snapshot of KV cache pressure from BranchStore. More... | |
| struct | lloyal::branch::SamplerChainEntry |
| RAII entry for a sampler chain in the registry. More... | |
| struct | lloyal::branch::GrammarEntry |
| RAII entry for a grammar sampler in the registry. More... | |
| struct | lloyal::branch::CachedSamplingParams |
| Concrete sampling params snapshot for memoization. More... | |
| struct | lloyal::branch::BranchState |
| Consolidated mutable state for a single branch. More... | |
| struct | lloyal::branch::DecodeEachItem |
| Item for decode_each: one token per branch. More... | |
| struct | lloyal::branch::DecodeScatterItem |
| Item for decode_scatter: variable tokens per branch. More... | |
| class | lloyal::branch::BranchStore |
| Handle table and batched decode orchestrator for branch management. More... | |
| struct | lloyal::branch::BranchStore::Allocation |
| Result of allocate(): a slot handle + its leased seq_id. More... | |
| class | lloyal::branch::Branch |
Namespaces | |
| namespace | lloyal |
| Boundary Tracker Stub for OSS liblloyal. | |
| namespace | lloyal::branch |
Typedefs | |
| using | lloyal::branch::BranchHandle = uint32_t |
| Opaque handle to a branch slot. | |
| using | lloyal::branch::SamplerChainHandle = int32_t |
| Handle to a sampler chain in BranchStore's registry (0 = invalid/none) | |
| using | lloyal::branch::GrammarHandle = int32_t |
| Handle to a grammar sampler in BranchStore's registry (0 = invalid/none) | |
| using | lloyal::branch::MetricsHandle = int32_t |
| Handle to a metrics tracker in BranchStore's registry (0 = invalid/none) | |
Functions | |
| uint16_t | lloyal::branch::handle_index (BranchHandle h) |
| Extract slot index from a branch handle. | |
| uint16_t | lloyal::branch::handle_generation (BranchHandle h) |
| Extract generation counter from a branch handle. | |
| BranchHandle | lloyal::branch::make_handle (uint16_t index, uint16_t generation) |
| Construct a branch handle from index and generation. | |
| template<SamplingParamsLike P> | |
| CachedSamplingParams | lloyal::branch::snapshot_params (const P &p) |
| Snapshot sampling params for memoization comparison. | |
| template<SamplingParamsLike P> | |
| BranchHandle | lloyal::branch::create (llama_context *ctx, const llama_model *model, BranchStore &s, llama_pos start_pos, const P ¶ms, int n_batch=DEFAULT_N_BATCH, const char *grammar_str=nullptr, boundaries::BoundaryTracker *boundary_tracker=nullptr) |
| Create a new branch with sampler chain, optional grammar, and metrics. | |
| BranchHandle | lloyal::branch::fork (BranchHandle source, BranchStore &s) |
| Fork a branch into a new independent sequence. | |
| void | lloyal::branch::set_logit_bias (BranchHandle handle, const llama_logit_bias *biases, size_t n_biases, BranchStore &s) |
| void | lloyal::branch::clear_logit_bias (BranchHandle handle, BranchStore &s) |
| Clear all logit biases from a branch. | |
| void | lloyal::branch::set_steer (BranchHandle handle, std::function< void(llama_token_data_array &)> steer_fn, BranchStore &s) |
| void | lloyal::branch::clear_steer (BranchHandle handle, BranchStore &s) |
| Clear the steer callback from a branch. | |
| template<SamplingParamsLike P> | |
| void | lloyal::branch::set_sampler_params (BranchHandle handle, const P ¶ms, BranchStore &s) |
| Replace a branch's sampler chain with new parameters. | |
| void | lloyal::branch::set_grammar (BranchHandle handle, const llama_model *model, const char *grammar_str, BranchStore &s) |
| Replace a branch's grammar constraint. | |
| void | lloyal::branch::set_grammar_lazy (BranchHandle handle, const llama_model *model, const char *grammar_str, const std::vector< std::string > &trigger_patterns, const std::vector< llama_token > &trigger_tokens, BranchStore &s) |
| Set lazy grammar on a branch (unconstrained until trigger fires) | |
| void | lloyal::branch::prune (BranchHandle handle, BranchStore &s) |
| Prune a leaf branch (RESTRICT — throws if children exist) | |
| void | lloyal::branch::pruneSubtree (BranchHandle h, BranchStore &s) |
| Prune a branch and all descendants (CASCADE — iterative post-order) | |
| void | lloyal::branch::force_snapshot_logits (BranchHandle handle, BranchStore &s) |
| Force-copy the shared llama.cpp logits buffer into this branch's private snapshot. | |
| void | lloyal::branch::prefill (BranchHandle handle, const llama_token *tokens, size_t n_tokens, BranchStore &s) |
| Decode multiple tokens and capture logits atomically (prompt prefill) | |
| void | lloyal::branch::step (BranchHandle handle, llama_token token, BranchStore &s) |
| Decode a single token and capture logits (generation step) | |
| const float * | lloyal::branch::get_logits (BranchHandle handle, BranchStore &s) |
| Get the branch's captured logits snapshot. | |
| llama_token | lloyal::branch::sample (BranchHandle handle, BranchStore &s) |
| Sample a token from the branch's captured logits. | |
| void | lloyal::branch::accept_token (BranchHandle handle, llama_token token, BranchStore &s) |
| Accept a sampled token, advancing grammar and sampler state. | |
| void | lloyal::branch::apply_grammar (BranchHandle handle, float *logits, int n_vocab, BranchStore &s) |
| Apply grammar constraints to an external logits buffer. | |
| std::vector< std::pair< llama_token, float > > | lloyal::branch::get_legal_priors (BranchHandle handle, BranchStore &s) |
| Get grammar-legal tokens with renormalized probabilities. | |
| float | lloyal::branch::get_legal_logsumexp (BranchHandle handle, BranchStore &s) |
| Compute log-sum-exp over grammar-legal logits. | |
| bool | lloyal::branch::is_token_legal (BranchHandle handle, llama_token token, BranchStore &s) |
| Check if a token is legal under grammar constraints. | |
| float | lloyal::branch::get_token_prior_assume_legal (BranchHandle handle, llama_token token, float logsumexp, BranchStore &s) |
| Compute prior probability for a token known to be grammar-legal. | |
| float | lloyal::branch::get_token_prior (BranchHandle handle, llama_token token, float logsumexp, BranchStore &s) |
| Compute prior probability for a token, checking grammar legality first. | |
| llama_pos | lloyal::branch::get_position (BranchHandle handle, BranchStore &s) |
| Get the branch's current decode position. | |
| llama_pos | lloyal::branch::get_fork_head (BranchHandle handle, BranchStore &s) |
| Get the branch's fork head (parent position at fork time) | |
| float | lloyal::branch::get_perplexity (BranchHandle handle, BranchStore &s) |
| Get model-level perplexity (from raw logits) | |
| float | lloyal::branch::get_sampling_perplexity (BranchHandle handle, BranchStore &s) |
| Get sampling-level perplexity (from filtered distribution) | |
| float | lloyal::branch::get_last_sampling_prior (BranchHandle handle, BranchStore &s) |
| Get the last sampled token's prior from the filtered distribution. | |
| int | lloyal::branch::get_n_vocab (BranchHandle handle, BranchStore &s) |
| Get the branch's vocabulary size. | |
Variables | |
| constexpr BranchHandle | lloyal::branch::INVALID_HANDLE = 0 |
| Null handle sentinel. | |
| constexpr llama_seq_id | lloyal::branch::NO_LEASE = kv::NO_LEASE |
| Branch has no KV residency. | |
| constexpr int | lloyal::branch::DEFAULT_N_BATCH = 512 |
| Default batch size for decode operations. | |
| constexpr uint32_t | lloyal::branch::GEN_SHIFT = 16 |
| Bit shift for generation field. | |
| constexpr uint32_t | lloyal::branch::INDEX_MASK = 0xFFFF |
| Mask for slot index field. | |
Branch Primitive for Tree Search and Multi-Sequence Generation.
Consolidates all forkable state into a single handle-based API:
Handle design:
KV cache constraint:
Fork semantics:
These primitives compose into inference patterns including:
Definition in file branch.hpp.