|
liblloyal 1.0.0
Composable primitives for llama.cpp inference
|
Batch Decoding Operations. More...
#include "common.hpp"#include "helpers.hpp"#include <algorithm>#include <cstdint>#include <llama/llama.h>#include <stdexcept>#include <vector>Go to the source code of this file.
Classes | |
| struct | lloyal::detail::BatchGuard |
| RAII guard for automatic batch cleanup Ensures llama_batch_free is called even if exceptions occur. More... | |
Namespaces | |
| namespace | lloyal |
| JSON Schema to Grammar Converter (Header-Only) | |
| namespace | lloyal::detail |
| namespace | lloyal::decoder |
Macros | |
| #define | LLOYAL_STACK_BATCH 1 |
| LLOYAL_STACK_BATCH - Controls llama_batch construction strategy. | |
Functions | |
| void | lloyal::detail::add_tokens_to_batch (llama_batch &batch, const llama_token *tokens, int32_t start_idx, int32_t n_eval, int32_t n_past, int32_t capacity, llama_seq_id seq_id=0) |
| Add tokens to batch with position info. | |
| void | lloyal::decoder::decode_tokens (llama_context *ctx, const llama_token *tokens, int32_t n_tokens, int32_t n_past, int32_t n_batch, llama_seq_id seq_id=0) |
| Process tokens through model to update KV cache. | |
| void | lloyal::decoder::decode_tokens (llama_context *ctx, const std::vector< llama_token > &tokens, int32_t n_past, int32_t n_batch, llama_seq_id seq_id=0) |
| Convenience overload for std::vector<llama_token> | |
| void | lloyal::decoder::decode_one (llama_context *ctx, llama_token tok, llama_pos pos, llama_seq_id seq_id=0, bool want_logits=true) |
| Decode a single token with zero heap allocation (when LLOYAL_STACK_BATCH=1) | |
Batch Decoding Operations.
Wraps llama.cpp decode APIs with batch management, chunking logic, and orchestration primitives. Provides both batched and single-token decode operations.
Uses batch utilities from helpers.hpp (batch_clear, batch_add) for token management.
Definition in file decoder.hpp.
| #define LLOYAL_STACK_BATCH 1 |
LLOYAL_STACK_BATCH - Controls llama_batch construction strategy.
When 1 (default): Use zero-allocation stack-constructed batch in decode_one()
When 0: Use thread_local batch via llama_batch_init()
If build breaks after llama.cpp update due to llama_batch changes:
Definition at line 32 of file decoder.hpp.