|
| llama_token | greedy (llama_context *ctx, const llama_vocab *vocab) |
| | Greedy sampling: Select token with highest probability.
|
| |
| template<SamplingParamsLike P> |
| llama_token | sample_with_params (llama_context *ctx, const llama_vocab *vocab, const P ¶ms, llama_sampler *grammarSampler=nullptr) |
| | Sample with configurable parameters (template accepts any SamplingParams type)
|
| |
| llama_token | greedy (llama_context *ctx, const llama_model *model) |
| | Greedy sampling with automatic vocab extraction.
|
| |
| template<SamplingParamsLike P> |
| llama_token | sample_with_params (llama_context *ctx, const llama_model *model, const P ¶ms, llama_sampler *grammarSampler=nullptr) |
| | Parameterized sampling with automatic vocab extraction.
|
| |
| template<SamplingParamsLike P> |
| llama_sampler * | create_chain (const P ¶ms) |
| | Create a persistent sampler chain from parameters.
|
| |
| llama_sampler * | clone_chain (llama_sampler *chain) |
| | Clone a sampler chain.
|
| |
| void | reseed_chain (llama_sampler *chain, uint32_t new_seed) |
| | Reseed the dist sampler in a chain.
|
| |
| void | free_chain (llama_sampler *chain) |
| | Free a sampler chain.
|
| |
| void | apply (llama_sampler *chain, llama_token_data_array *cur_p) |
| | Apply a sampler chain to a candidate array.
|
| |
| void | accept (llama_sampler *chain, llama_token token) |
| | Accept a token into the sampler chain.
|
| |
| llama_token lloyal::sampler::greedy |
( |
llama_context * |
ctx, |
|
|
const llama_vocab * |
vocab |
|
) |
| |
|
inline |
Greedy sampling: Select token with highest probability.
Uses llama_get_logits_ith(-1) to get last-step logits (requires logits=true in batch for that position). Performs argmax to find best token.
- Parameters
-
| ctx | Llama context (must have decoded at least one token with logits=true) |
| vocab | Vocabulary for size information |
- Returns
- Token ID with highest probability
- Exceptions
-
| std::runtime_error | if logits retrieval fails |
IMPORTANT: Only works if decode batch had logits=true for last token. Decoder layer automatically sets this correctly.
Definition at line 110 of file sampler.hpp.
template<SamplingParamsLike P>
| llama_token lloyal::sampler::sample_with_params |
( |
llama_context * |
ctx, |
|
|
const llama_model * |
model, |
|
|
const P & |
params, |
|
|
llama_sampler * |
grammarSampler = nullptr |
|
) |
| |
|
inline |
Parameterized sampling with automatic vocab extraction.
Convenience wrapper that handles vocab extraction from model. Supports temperature, top-k, top-p, min-p, and penalty parameters.
- Parameters
-
| ctx | Llama context |
| model | Llama model |
| params | Sampling parameters (any SamplingParamsLike type) |
| grammarSampler | Optional grammar constraint (default: nullptr) |
- Returns
- Sampled token ID
Definition at line 429 of file sampler.hpp.
template<SamplingParamsLike P>
| llama_token lloyal::sampler::sample_with_params |
( |
llama_context * |
ctx, |
|
|
const llama_vocab * |
vocab, |
|
|
const P & |
params, |
|
|
llama_sampler * |
grammarSampler = nullptr |
|
) |
| |
|
inline |
Sample with configurable parameters (template accepts any SamplingParams type)
Supports full range of llama.cpp sampling strategies:
- Temperature scaling
- Top-k, top-p, min-p filtering
- Repetition penalties (frequency, presence, repeat)
- Grammar constraints (via persistent grammar sampler)
- Parameters
-
| ctx | Llama context (must have decoded at least one token with logits=true) |
| vocab | Vocabulary for token information |
| params | Sampling parameters (any type matching SamplingParamsLike concept) |
| grammarSampler | Optional persistent grammar sampler (managed by caller) |
- Returns
- Sampled token ID
- Exceptions
-
| std::runtime_error | if sampling fails |
TEMPLATE INSTANTIATION: Works with any SamplingParams type matching the concept constraint. No adapters needed - uses duck typing + C++20 concepts.
Definition at line 179 of file sampler.hpp.