|
liblloyal 1.0.0
Branched Inference for llama.cpp
|
Consolidated mutable state for a single branch. More...
#include <lloyal/branch.hpp>
Public Attributes | |
| llama_context * | ctx = nullptr |
| Llama context (not owned, must outlive branch) | |
| const llama_model * | model = nullptr |
| Llama model (not owned, must outlive branch) | |
| llama_seq_id | seq_id = NO_LEASE |
| KV cache sequence identifier (NO_LEASE when inactive) | |
| llama_pos | position = 0 |
| Current decode position in the sequence. | |
| llama_pos | fork_head = 0 |
| Parent's position at fork time (0 for root branches) | |
| SamplerChainHandle | sampler_chain = 0 |
| Handle into BranchStore's sampler registry. | |
| GrammarHandle | grammar = 0 |
| Handle into BranchStore's grammar registry. | |
| CachedSamplingParams | cached_params |
| Params used to create current chain (for memoization) | |
| boundaries::BoundaryTracker * | boundary_tracker = nullptr |
| Token boundary detector (owned, optional) | |
| std::vector< llama_logit_bias > | logit_bias |
| Static token biases, cloned on fork. | |
| std::function< void(llama_token_data_array &)> | steer_fn |
| Dynamic logit callback, NOT cloned on fork. | |
| MetricsHandle | metrics = 0 |
| Handle into BranchStore's metrics registry. | |
| llama_token | last_token = -1 |
| Last token returned by sample() | |
| std::vector< llama_token_data > | last_candidates |
| Filtered candidates from last sample() | |
| std::vector< float > | logits_snapshot |
| Captured logit distribution (n_vocab floats) | |
| bool | has_logits = false |
| True only after force_snapshot_logits(), prefill(), or step() | |
| std::vector< llama_token_data > | candidates_buffer |
| Reusable scratch buffer for sampling (avoids O(n_vocab) allocs per sample call). | |
| int | n_batch = DEFAULT_N_BATCH |
| Batch size for decode operations. | |
| int | n_vocab = 0 |
| Vocabulary size (cached for buffer pre-allocation) | |
| uint16_t | generation = 0 |
| Slot generation counter (for ABA prevention) | |
| bool | in_use = false |
| True when slot is allocated to an active branch. | |
| BranchHandle | parent = INVALID_HANDLE |
| Parent branch (INVALID_HANDLE if root) | |
| std::vector< BranchHandle > | children |
| Child branches forked from this one. | |
Consolidated mutable state for a single branch.
Each branch encapsulates all state needed for independent generation:
Forkable state (cloned by fork()):
Non-forkable state (NOT cloned by fork()):
Sampling application order in sample(): Grammar → Logit Bias → Steer → Sampler Chain
Definition at line 275 of file branch.hpp.
| boundaries::BoundaryTracker* lloyal::branch::BranchState::boundary_tracker = nullptr |
Token boundary detector (owned, optional)
Definition at line 288 of file branch.hpp.
| CachedSamplingParams lloyal::branch::BranchState::cached_params |
Params used to create current chain (for memoization)
Definition at line 286 of file branch.hpp.
| std::vector<llama_token_data> lloyal::branch::BranchState::candidates_buffer |
Reusable scratch buffer for sampling (avoids O(n_vocab) allocs per sample call).
Memory footprint per branch for large vocab models (128k tokens):
last_candidates: typically ~40 entries with top_k=40 (~480 bytes)
Definition at line 309 of file branch.hpp.
| std::vector<BranchHandle> lloyal::branch::BranchState::children |
Child branches forked from this one.
Definition at line 319 of file branch.hpp.
| llama_context* lloyal::branch::BranchState::ctx = nullptr |
Llama context (not owned, must outlive branch)
Definition at line 276 of file branch.hpp.
| llama_pos lloyal::branch::BranchState::fork_head = 0 |
Parent's position at fork time (0 for root branches)
Definition at line 281 of file branch.hpp.
| uint16_t lloyal::branch::BranchState::generation = 0 |
Slot generation counter (for ABA prevention)
Definition at line 314 of file branch.hpp.
| GrammarHandle lloyal::branch::BranchState::grammar = 0 |
Handle into BranchStore's grammar registry.
Definition at line 284 of file branch.hpp.
| bool lloyal::branch::BranchState::has_logits = false |
True only after force_snapshot_logits(), prefill(), or step()
Definition at line 299 of file branch.hpp.
| bool lloyal::branch::BranchState::in_use = false |
True when slot is allocated to an active branch.
Definition at line 315 of file branch.hpp.
| std::vector<llama_token_data> lloyal::branch::BranchState::last_candidates |
Filtered candidates from last sample()
Definition at line 296 of file branch.hpp.
| llama_token lloyal::branch::BranchState::last_token = -1 |
Last token returned by sample()
Definition at line 295 of file branch.hpp.
| std::vector<llama_logit_bias> lloyal::branch::BranchState::logit_bias |
Static token biases, cloned on fork.
Definition at line 290 of file branch.hpp.
| std::vector<float> lloyal::branch::BranchState::logits_snapshot |
Captured logit distribution (n_vocab floats)
Definition at line 298 of file branch.hpp.
| MetricsHandle lloyal::branch::BranchState::metrics = 0 |
Handle into BranchStore's metrics registry.
Definition at line 293 of file branch.hpp.
| const llama_model* lloyal::branch::BranchState::model = nullptr |
Llama model (not owned, must outlive branch)
Definition at line 277 of file branch.hpp.
| int lloyal::branch::BranchState::n_batch = DEFAULT_N_BATCH |
Batch size for decode operations.
Definition at line 311 of file branch.hpp.
| int lloyal::branch::BranchState::n_vocab = 0 |
Vocabulary size (cached for buffer pre-allocation)
Definition at line 312 of file branch.hpp.
| BranchHandle lloyal::branch::BranchState::parent = INVALID_HANDLE |
Parent branch (INVALID_HANDLE if root)
Definition at line 318 of file branch.hpp.
| llama_pos lloyal::branch::BranchState::position = 0 |
Current decode position in the sequence.
Definition at line 280 of file branch.hpp.
| SamplerChainHandle lloyal::branch::BranchState::sampler_chain = 0 |
Handle into BranchStore's sampler registry.
Definition at line 283 of file branch.hpp.
| llama_seq_id lloyal::branch::BranchState::seq_id = NO_LEASE |
KV cache sequence identifier (NO_LEASE when inactive)
Definition at line 279 of file branch.hpp.
| std::function<void(llama_token_data_array&)> lloyal::branch::BranchState::steer_fn |
Dynamic logit callback, NOT cloned on fork.
Definition at line 291 of file branch.hpp.