Chat input formatting with full format awareness. More...

Classes
struct	FormatInputs
	Input parameters for chat formatting. More...

struct	FormatResult
	Result from chat template formatting with full format awareness. More...

Functions
FormatResult	format (const llama_model *model, const FormatInputs &inputs)
	Format chat messages using model's chat template with full format awareness.

bool	validate (const std::string &template_str)
	Validate chat template syntax.

std::vector< llama_token >	fallback_to_eog (const llama_model *model)
	Get EOG token as fallback when template parsing fails.

std::string	get_token_safe (const llama_model *model, llama_token token)
	Get token text safely.

std::vector< llama_token >	get_turn_separator (const llama_model *model)
	Get turn separator tokens for the model's chat template.

Detailed Description

Chat input formatting with full format awareness.

Wraps llama.cpp's chat template engine to produce formatted prompts with all format-awareness metadata (grammar, triggers, parser) needed for correct output parsing via lloyal::chat_out.

Function Documentation

◆ fallback_to_eog()

std::vector< llama_token > lloyal::chat_in::fallback_to_eog ( const llama_model * model )

inline

Get EOG token as fallback when template parsing fails.

Returns the model's end-of-generation token wrapped in a vector. Prefers EOT (end-of-turn) token, falling back to EOS (end-of-sequence).

Parameters

model Llama model pointer

Returns: Vector containing single EOG token, or empty vector if no EOG token exists

Definition at line 480 of file chat_in.hpp.

◆ format()

FormatResult lloyal::chat_in::format	(	const llama_model *	model,
		const FormatInputs &	inputs
	)

inline

Format chat messages using model's chat template with full format awareness.

Orchestrates chat template processing with graceful degradation:

Parses tools and messages from JSON
Applies common_chat_templates_apply() with all format-awareness fields
Returns all common_chat_params fields for downstream grammar/parsing use
Falls back to simple "role: content" format if template fails
Returns empty result on JSON parsing errors (never throws)

Parameters

model	Llama model pointer (provides template and vocabulary)
inputs	FormatInputs struct with messages, tools, and format options

Returns: FormatResult containing prompt, format, grammar, triggers, and parser info

Note: This function never throws. On error, returns empty prompt.

See also: common_chat_templates_apply(); lloyal::chat_out::parse()

// Basic usage (no tools)
chat_in::FormatInputs inputs;
inputs.messages_json = R"([{"role":"user","content":"Hi"}])";
auto result = chat_in::format(model, inputs);
auto tokens = tokenizer::tokenize(vocab, result.prompt, true, true);
 
// With tools
inputs.tools_json = R"([{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}])";
auto tool_result = chat_in::format(model, inputs);
// tool_result.format != CONTENT_ONLY when tools are active
// tool_result.grammar contains GBNF for constrained tool call output

Definition at line 136 of file chat_in.hpp.

◆ get_token_safe()

std::string lloyal::chat_in::get_token_safe	(	const llama_model *	model,
		llama_token	token
	)

inline

Get token text safely.

Parameters

model	Llama model pointer
token	Token ID

Returns: Token text, or empty string if invalid

Definition at line 502 of file chat_in.hpp.

◆ get_turn_separator()

std::vector< llama_token > lloyal::chat_in::get_turn_separator ( const llama_model * model )

inline

Get turn separator tokens for the model's chat template.

Extracts the token sequence that closes an assistant turn and transitions to the next message. This enables exact parity between cold-start and warm multi-turn continuation paths.

Algorithm

Uses a 3-message probe technique:

Format: [user:"X", assistant:SENTINEL, user:SENTINEL2]
Extract text between SENTINEL and SENTINEL2
Tokenize with parse_special=true
Keep tokens up to and including EOG + trailing whitespace

Template-Specific Results

| Template | Separator Tokens | Text Representation | |-------—|---------------—|------------------—| | ChatML | [im_end, \n] | <|im_end|>\n | | Llama-3 | [eot_id] | <|eot_id|> | | Phi-3 | [end, \n] | <|end|>\n | | Zephyr | [eos, \n] | </s>\n |

Parameters

model Llama model pointer (provides template and vocabulary)

Returns: Vector of token IDs representing the turn separator. Falls back to single EOG token if template parsing fails. Returns empty vector only if model has no EOG token.

Note: Result is typically cached by the caller (e.g., SessionContext).

auto separator = chat_in::get_turn_separator(model);
auto delta_tokens = tokenizer::tokenize(vocab, delta_prompt, false, true);
std::vector<llama_token> prefill_tokens;
prefill_tokens.insert(prefill_tokens.end(), separator.begin(), separator.end());
prefill_tokens.insert(prefill_tokens.end(), delta_tokens.begin(), delta_tokens.end());

Definition at line 549 of file chat_in.hpp.

◆ validate()

bool lloyal::chat_in::validate ( const std::string & template_str )

inline

Validate chat template syntax.

Performs syntax-only validation of a Jinja2-style chat template. Does NOT require a model — useful for validating user-provided templates before attempting to format messages.

Parameters

template_str Jinja2-style template string to validate

Returns: true if template syntax is valid, false otherwise

Note: This function never throws. Returns false on any error.

Definition at line 459 of file chat_in.hpp.

Classes

Functions

Detailed Description

Function Documentation

◆ fallback_to_eog()

◆ format()

◆ get_token_safe()

◆ get_turn_separator()

Algorithm

Template-Specific Results

◆ validate()