liblloyal 1.0.0
Branched Inference for llama.cpp
Loading...
Searching...
No Matches
lloyal::chat_in Namespace Reference

Chat input formatting with full format awareness. More...

Classes

struct  FormatInputs
 Input parameters for chat formatting. More...
 
struct  FormatResult
 Result from chat template formatting with full format awareness. More...
 

Functions

FormatResult format (const llama_model *model, const FormatInputs &inputs)
 Format chat messages using model's chat template with full format awareness.
 
bool validate (const std::string &template_str)
 Validate chat template syntax.
 
std::vector< llama_token > fallback_to_eog (const llama_model *model)
 Get EOG token as fallback when template parsing fails.
 
std::string get_token_safe (const llama_model *model, llama_token token)
 Get token text safely.
 
std::vector< llama_token > get_turn_separator (const llama_model *model)
 Get turn separator tokens for the model's chat template.
 

Detailed Description

Chat input formatting with full format awareness.

Wraps llama.cpp's chat template engine to produce formatted prompts with all format-awareness metadata (grammar, triggers, parser) needed for correct output parsing via lloyal::chat_out.

Function Documentation

◆ fallback_to_eog()

std::vector< llama_token > lloyal::chat_in::fallback_to_eog ( const llama_model *  model)
inline

Get EOG token as fallback when template parsing fails.

Returns the model's end-of-generation token wrapped in a vector. Prefers EOT (end-of-turn) token, falling back to EOS (end-of-sequence).

Parameters
modelLlama model pointer
Returns
Vector containing single EOG token, or empty vector if no EOG token exists

Definition at line 301 of file chat_in.hpp.

◆ format()

FormatResult lloyal::chat_in::format ( const llama_model *  model,
const FormatInputs inputs 
)
inline

Format chat messages using model's chat template with full format awareness.

Orchestrates chat template processing with graceful degradation:

  1. Parses tools and messages from JSON
  2. Applies common_chat_templates_apply() with all format-awareness fields
  3. Returns all common_chat_params fields for downstream grammar/parsing use
  4. Falls back to simple "role: content" format if template fails
  5. Returns empty result on JSON parsing errors (never throws)
Parameters
modelLlama model pointer (provides template and vocabulary)
inputsFormatInputs struct with messages, tools, and format options
Returns
FormatResult containing prompt, format, grammar, triggers, and parser info
Note
This function never throws. On error, returns empty prompt.
See also
common_chat_templates_apply()
lloyal::chat_out::parse()
// Basic usage (no tools)
inputs.messages_json = R"([{"role":"user","content":"Hi"}])";
auto result = chat_in::format(model, inputs);
auto tokens = tokenizer::tokenize(vocab, result.prompt, true, true);
// With tools
inputs.tools_json = R"([{"type":"function","function":{"name":"get_weather","description":"Get weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}])";
auto tool_result = chat_in::format(model, inputs);
// tool_result.format != CONTENT_ONLY when tools are active
// tool_result.grammar contains GBNF for constrained tool call output
FormatResult format(const llama_model *model, const FormatInputs &inputs)
Format chat messages using model's chat template with full format awareness.
Definition chat_in.hpp:135
std::vector< llama_token > tokenize(const llama_vocab *vocab, const std::string &text, bool add_special, bool parse_special)
Tokenize text to token array.
Definition tokenizer.hpp:38
Input parameters for chat formatting.
Definition chat_in.hpp:64
std::string messages_json
JSON array of OpenAI-format messages (required)
Definition chat_in.hpp:65
std::string tools_json
JSON array of OpenAI-format tool definitions.
Definition chat_in.hpp:68

Definition at line 135 of file chat_in.hpp.

◆ get_token_safe()

std::string lloyal::chat_in::get_token_safe ( const llama_model *  model,
llama_token  token 
)
inline

Get token text safely.

Parameters
modelLlama model pointer
tokenToken ID
Returns
Token text, or empty string if invalid

Definition at line 323 of file chat_in.hpp.

◆ get_turn_separator()

std::vector< llama_token > lloyal::chat_in::get_turn_separator ( const llama_model *  model)
inline

Get turn separator tokens for the model's chat template.

Extracts the token sequence that closes an assistant turn and transitions to the next message. This enables exact parity between cold-start and warm multi-turn continuation paths.

Algorithm

Uses a 3-message probe technique:

  1. Format: [user:"X", assistant:SENTINEL, user:SENTINEL2]
  2. Extract text between SENTINEL and SENTINEL2
  3. Tokenize with parse_special=true
  4. Keep tokens up to and including EOG + trailing whitespace

Template-Specific Results

| Template | Separator Tokens | Text Representation | |-------—|---------------—|------------------—| | ChatML | [im_end, \n] | <|im_end|>\n | | Llama-3 | [eot_id] | <|eot_id|> | | Phi-3 | [end, \n] | <|end|>\n | | Zephyr | [eos, \n] | </s>\n |

Parameters
modelLlama model pointer (provides template and vocabulary)
Returns
Vector of token IDs representing the turn separator. Falls back to single EOG token if template parsing fails. Returns empty vector only if model has no EOG token.
Note
Result is typically cached by the caller (e.g., SessionContext).
auto separator = chat_in::get_turn_separator(model);
auto delta_tokens = tokenizer::tokenize(vocab, delta_prompt, false, true);
std::vector<llama_token> prefill_tokens;
prefill_tokens.insert(prefill_tokens.end(), separator.begin(), separator.end());
prefill_tokens.insert(prefill_tokens.end(), delta_tokens.begin(), delta_tokens.end());
std::vector< llama_token > get_turn_separator(const llama_model *model)
Get turn separator tokens for the model's chat template.
Definition chat_in.hpp:370

Definition at line 370 of file chat_in.hpp.

◆ validate()

bool lloyal::chat_in::validate ( const std::string &  template_str)
inline

Validate chat template syntax.

Performs syntax-only validation of a Jinja2-style chat template. Does NOT require a model — useful for validating user-provided templates before attempting to format messages.

Parameters
template_strJinja2-style template string to validate
Returns
true if template syntax is valid, false otherwise
Note
This function never throws. Returns false on any error.

Definition at line 280 of file chat_in.hpp.