liblloyal 1.0.0
Composable primitives for llama.cpp inference
Loading...
Searching...
No Matches
lloyal::sampler Namespace Reference

Functions

llama_token greedy (llama_context *ctx, const llama_vocab *vocab)
 Greedy sampling: Select token with highest probability.
 
template<SamplingParamsLike P>
llama_token sample_with_params (llama_context *ctx, const llama_vocab *vocab, const P &params, llama_sampler *grammarSampler=nullptr)
 Sample with configurable parameters (template accepts any SamplingParams type)
 
llama_token greedy (llama_context *ctx, const llama_model *model)
 Greedy sampling with automatic vocab extraction.
 
template<SamplingParamsLike P>
llama_token sample_with_params (llama_context *ctx, const llama_model *model, const P &params, llama_sampler *grammarSampler=nullptr)
 Parameterized sampling with automatic vocab extraction.
 
template<SamplingParamsLike P>
llama_sampler * create_chain (const P &params)
 Create a persistent sampler chain from parameters.
 
llama_sampler * clone_chain (llama_sampler *chain)
 Clone a sampler chain.
 
void reseed_chain (llama_sampler *chain, uint32_t new_seed)
 Reseed the dist sampler in a chain.
 
void free_chain (llama_sampler *chain)
 Free a sampler chain.
 
void apply (llama_sampler *chain, llama_token_data_array *cur_p)
 Apply a sampler chain to a candidate array.
 
void accept (llama_sampler *chain, llama_token token)
 Accept a token into the sampler chain.
 

Function Documentation

◆ accept()

void lloyal::sampler::accept ( llama_sampler *  chain,
llama_token  token 
)
inline

Accept a token into the sampler chain.

Updates internal state (e.g., penalty tracking).

Parameters
chainSampler chain
tokenToken to accept

Definition at line 595 of file sampler.hpp.

◆ apply()

void lloyal::sampler::apply ( llama_sampler *  chain,
llama_token_data_array *  cur_p 
)
inline

Apply a sampler chain to a candidate array.

Modifies candidates in-place, setting cur_p.selected to the chosen token.

Parameters
chainSampler chain to apply
cur_pCandidate array (modified in-place)

Definition at line 581 of file sampler.hpp.

◆ clone_chain()

llama_sampler * lloyal::sampler::clone_chain ( llama_sampler *  chain)
inline

Clone a sampler chain.

Creates an independent copy of the chain with the same state. Used when forking branches in MCTS.

Parameters
chainSource chain to clone
Returns
Owned clone - caller must free with free_chain()

Definition at line 526 of file sampler.hpp.

◆ create_chain()

template<SamplingParamsLike P>
llama_sampler * lloyal::sampler::create_chain ( const P &  params)
inline

Create a persistent sampler chain from parameters.

Builds a sampler chain that can be reused for multiple samples. Handles temperature <= 0 as greedy mode (deterministic argmax).

Parameters
paramsSampling parameters (any SamplingParamsLike type)
Returns
Owned sampler chain - caller must free with free_chain()

Definition at line 465 of file sampler.hpp.

◆ free_chain()

void lloyal::sampler::free_chain ( llama_sampler *  chain)
inline

Free a sampler chain.

Parameters
chainChain to free (safe to call with nullptr)

Definition at line 567 of file sampler.hpp.

◆ greedy() [1/2]

llama_token lloyal::sampler::greedy ( llama_context *  ctx,
const llama_model *  model 
)
inline

Greedy sampling with automatic vocab extraction.

Convenience wrapper that handles vocab extraction from model. Selects the token with highest probability (argmax on logits).

Parameters
ctxLlama context
modelLlama model
Returns
Token with highest probability

Definition at line 400 of file sampler.hpp.

◆ greedy() [2/2]

llama_token lloyal::sampler::greedy ( llama_context *  ctx,
const llama_vocab *  vocab 
)
inline

Greedy sampling: Select token with highest probability.

Uses llama_get_logits_ith(-1) to get last-step logits (requires logits=true in batch for that position). Performs argmax to find best token.

Parameters
ctxLlama context (must have decoded at least one token with logits=true)
vocabVocabulary for size information
Returns
Token ID with highest probability
Exceptions
std::runtime_errorif logits retrieval fails

IMPORTANT: Only works if decode batch had logits=true for last token. Decoder layer automatically sets this correctly.

Definition at line 110 of file sampler.hpp.

◆ reseed_chain()

void lloyal::sampler::reseed_chain ( llama_sampler *  chain,
uint32_t  new_seed 
)
inline

Reseed the dist sampler in a chain.

Used in MCTS to ensure forked branches generate unique sequences. Removes the existing dist sampler (last in chain) and adds a new one.

Parameters
chainChain to reseed
new_seedNew seed for randomness

Definition at line 542 of file sampler.hpp.

◆ sample_with_params() [1/2]

template<SamplingParamsLike P>
llama_token lloyal::sampler::sample_with_params ( llama_context *  ctx,
const llama_model *  model,
const P &  params,
llama_sampler *  grammarSampler = nullptr 
)
inline

Parameterized sampling with automatic vocab extraction.

Convenience wrapper that handles vocab extraction from model. Supports temperature, top-k, top-p, min-p, and penalty parameters.

Parameters
ctxLlama context
modelLlama model
paramsSampling parameters (any SamplingParamsLike type)
grammarSamplerOptional grammar constraint (default: nullptr)
Returns
Sampled token ID

Definition at line 429 of file sampler.hpp.

◆ sample_with_params() [2/2]

template<SamplingParamsLike P>
llama_token lloyal::sampler::sample_with_params ( llama_context *  ctx,
const llama_vocab *  vocab,
const P &  params,
llama_sampler *  grammarSampler = nullptr 
)
inline

Sample with configurable parameters (template accepts any SamplingParams type)

Supports full range of llama.cpp sampling strategies:

  • Temperature scaling
  • Top-k, top-p, min-p filtering
  • Repetition penalties (frequency, presence, repeat)
  • Grammar constraints (via persistent grammar sampler)
Parameters
ctxLlama context (must have decoded at least one token with logits=true)
vocabVocabulary for token information
paramsSampling parameters (any type matching SamplingParamsLike concept)
grammarSamplerOptional persistent grammar sampler (managed by caller)
Returns
Sampled token ID
Exceptions
std::runtime_errorif sampling fails

TEMPLATE INSTANTIATION: Works with any SamplingParams type matching the concept constraint. No adapters needed - uses duck typing + C++20 concepts.

Definition at line 179 of file sampler.hpp.