Batch Decoding Operations. More...

#include "common.hpp"
#include "helpers.hpp"
#include <algorithm>
#include <cstdint>
#include <llama/llama.h>
#include <stdexcept>
#include <vector>

Classes
struct	lloyal::detail::BatchGuard
	RAII guard for automatic batch cleanup Ensures llama_batch_free is called even if exceptions occur. More...

Namespaces
namespace	lloyal
	JSON Schema to Grammar Converter (Header-Only)

namespace	lloyal::detail

namespace	lloyal::decoder

Macros
#define	LLOYAL_STACK_BATCH 1
	LLOYAL_STACK_BATCH - Controls llama_batch construction strategy.

Functions
void	lloyal::detail::add_tokens_to_batch (llama_batch &batch, const llama_token *tokens, int32_t start_idx, int32_t n_eval, int32_t n_past, int32_t capacity, llama_seq_id seq_id=0)
	Add tokens to batch with position info.

void	lloyal::decoder::decode_tokens (llama_context ctx, const llama_token tokens, int32_t n_tokens, int32_t n_past, int32_t n_batch, llama_seq_id seq_id=0)
	Process tokens through model to update KV cache.

void	lloyal::decoder::decode_tokens (llama_context *ctx, const std::vector< llama_token > &tokens, int32_t n_past, int32_t n_batch, llama_seq_id seq_id=0)
	Convenience overload for std::vector<llama_token>

void	lloyal::decoder::decode_one (llama_context *ctx, llama_token tok, llama_pos pos, llama_seq_id seq_id=0, bool want_logits=true)
	Decode a single token with zero heap allocation (when LLOYAL_STACK_BATCH=1)

Detailed Description

Batch Decoding Operations.

Wraps llama.cpp decode APIs with batch management, chunking logic, and orchestration primitives. Provides both batched and single-token decode operations.

Uses batch utilities from helpers.hpp (batch_clear, batch_add) for token management.

Definition in file decoder.hpp.

Macro Definition Documentation

◆ LLOYAL_STACK_BATCH

#define LLOYAL_STACK_BATCH 1

LLOYAL_STACK_BATCH - Controls llama_batch construction strategy.

When 1 (default): Use zero-allocation stack-constructed batch in decode_one()

Fastest: no heap allocation per decode
Risk: breaks if llama_batch struct layout changes

When 0: Use thread_local batch via llama_batch_init()

Slightly slower: one-time init per thread
Safe: uses llama.cpp's own initializer, handles new fields

If build breaks after llama.cpp update due to llama_batch changes:

Set LLOYAL_STACK_BATCH=0 to unblock immediately
Update decode_one() to match new struct layout
Update ABI stability test assertions
Re-enable LLOYAL_STACK_BATCH=1

Definition at line 32 of file decoder.hpp.

Classes

Namespaces

Macros

Functions

Detailed Description

Macro Definition Documentation

◆ LLOYAL_STACK_BATCH