liblloyal 1.0.0
Branched Inference for llama.cpp
Loading...
Searching...
No Matches
lloyal::logits Namespace Reference

Functions

float * get (llama_context *ctx, int32_t index=-1)
 
void process_chunks (llama_context *ctx, const std::vector< std::span< const llama_token > > &prompts, std::vector< float * > &output, int32_t n_vocab)
 Process arbitrary number of complete prompts for logit extraction.
 

Function Documentation

◆ get()

float * lloyal::logits::get ( llama_context *  ctx,
int32_t  index = -1 
)
inline

◆ process_chunks()

void lloyal::logits::process_chunks ( llama_context *  ctx,
const std::vector< std::span< const llama_token > > &  prompts,
std::vector< float * > &  output,
int32_t  n_vocab 
)
inline

Process arbitrary number of complete prompts for logit extraction.

Handles prompt counts exceeding n_seq_max by processing in groups. Within each group, prompts are bin-packed via decode::bin_pack() into n_batch-sized chunks, then dispatched via scatter/many. After each group, used seq_ids are evicted from KV to make room for the next.

Parameters
ctxLlama context (caller must ensure exclusive access)
promptsComplete token arrays (any count — groups by n_seq_max)
outputPre-allocated float buffers, one per prompt, each n_vocab
n_vocabVocabulary size (for memcpy sizing)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/logits.hpp.

Definition at line 108 of file logits.hpp.