|
liblloyal 1.0.0
Branched Inference for llama.cpp
|
Chat input formatting with full format awareness. More...
Classes | |
| struct | FormatInputs |
| Input parameters for chat formatting. More... | |
| struct | FormatResult |
| Result from chat template formatting with full format awareness. More... | |
Functions | |
| FormatResult | format (const llama_model *model, const FormatInputs &inputs) |
| Format chat messages using model's chat template with full format awareness. | |
| bool | validate (const std::string &template_str) |
| Validate chat template syntax. | |
| std::vector< llama_token > | fallback_to_eog (const llama_model *model) |
| Get EOG token as fallback when template parsing fails. | |
| std::string | get_token_safe (const llama_model *model, llama_token token) |
| Get token text safely. | |
| std::vector< llama_token > | get_turn_separator (const llama_model *model) |
| Get turn separator tokens for the model's chat template. | |
Chat input formatting with full format awareness.
Wraps llama.cpp's chat template engine to produce formatted prompts with all format-awareness metadata (grammar, triggers, parser) needed for correct output parsing via lloyal::chat_out.
|
inline |
Get EOG token as fallback when template parsing fails.
Returns the model's end-of-generation token wrapped in a vector. Prefers EOT (end-of-turn) token, falling back to EOS (end-of-sequence).
| model | Llama model pointer |
Definition at line 301 of file chat_in.hpp.
|
inline |
Format chat messages using model's chat template with full format awareness.
Orchestrates chat template processing with graceful degradation:
| model | Llama model pointer (provides template and vocabulary) |
| inputs | FormatInputs struct with messages, tools, and format options |
Definition at line 135 of file chat_in.hpp.
|
inline |
Get token text safely.
| model | Llama model pointer |
| token | Token ID |
Definition at line 323 of file chat_in.hpp.
|
inline |
Get turn separator tokens for the model's chat template.
Extracts the token sequence that closes an assistant turn and transitions to the next message. This enables exact parity between cold-start and warm multi-turn continuation paths.
Uses a 3-message probe technique:
[user:"X", assistant:SENTINEL, user:SENTINEL2]parse_special=true| Template | Separator Tokens | Text Representation | |-------—|---------------—|------------------—| | ChatML | [im_end, \n] | <|im_end|>\n | | Llama-3 | [eot_id] | <|eot_id|> | | Phi-3 | [end, \n] | <|end|>\n | | Zephyr | [eos, \n] | </s>\n |
| model | Llama model pointer (provides template and vocabulary) |
Definition at line 370 of file chat_in.hpp.
|
inline |
Validate chat template syntax.
Performs syntax-only validation of a Jinja2-style chat template. Does NOT require a model — useful for validating user-provided templates before attempting to format messages.
| template_str | Jinja2-style template string to validate |
true if template syntax is valid, false otherwise Definition at line 280 of file chat_in.hpp.