Functions
float *	get (llama_context *ctx, int32_t index=-1)

void	process_chunks (llama_context ctx, const std::vector< std::span< const llama_token > > &prompts, std::vector< float > &output, int32_t n_vocab)
	Process arbitrary number of complete prompts for logit extraction.

Function Documentation

◆ get()

float * lloyal::logits::get	(	llama_context *	ctx,
		int32_t	index = `-1`
	)

inline

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/branch.hpp, and /home/runner/work/liblloyal/liblloyal/include/lloyal/logits.hpp.

Definition at line 78 of file logits.hpp.

◆ process_chunks()

void lloyal::logits::process_chunks	(	llama_context *	ctx,
		const std::vector< std::span< const llama_token > > &	prompts,
		std::vector< float * > &	output,
		int32_t	n_vocab
	)

inline

Process arbitrary number of complete prompts for logit extraction.

Handles prompt counts exceeding n_seq_max by processing in groups. Within each group, prompts are bin-packed via decode::bin_pack() into n_batch-sized chunks, then dispatched via scatter/many. After each group, used seq_ids are evicted from KV to make room for the next.

Parameters

ctx	Llama context (caller must ensure exclusive access)
prompts	Complete token arrays (any count — groups by n_seq_max)
output	Pre-allocated float buffers, one per prompt, each n_vocab
n_vocab	Vocabulary size (for memcpy sizing)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/logits.hpp.

Definition at line 108 of file logits.hpp.

Functions

Function Documentation

◆ get()

◆ process_chunks()