Process arbitrary number of complete prompts for logit extraction.
Handles prompt counts exceeding n_seq_max by processing in groups. Within each group, prompts are bin-packed via decode::bin_pack() into n_batch-sized chunks, then dispatched via scatter/many. After each group, used seq_ids are evicted from KV to make room for the next.
- Parameters
-
| ctx | Llama context (caller must ensure exclusive access) |
| prompts | Complete token arrays (any count — groups by n_seq_max) |
| output | Pre-allocated float buffers, one per prompt, each n_vocab |
| n_vocab | Vocabulary size (for memcpy sizing) |
- Examples
- /home/runner/work/liblloyal/liblloyal/include/lloyal/logits.hpp.
Definition at line 108 of file logits.hpp.