Chat output parsing (tool calls, reasoning, content) More...

Classes
struct	ParseResult
	Result from parsing model output. More...

struct	ToolCall
	A single tool call extracted from model output. More...

Functions
ParseResult	parse (const std::string &output, common_chat_format format, common_reasoning_format reasoning_format=COMMON_REASONING_FORMAT_NONE, bool is_partial=false, const std::string &generation_prompt="", const std::string &parser_data="")
	Parse model output with explicit format.

ParseResult	parse (const llama_model *model, const std::string &output, bool is_partial=false)
	Parse model output with auto-detected format from model template.

Detailed Description

Chat output parsing (tool calls, reasoning, content)

Wraps llama.cpp's common_chat_parse() to extract structured content from model output. Pairs with lloyal::chat_in – use the format from chat_in::FormatResult to select the correct parser.

Function Documentation

◆ parse() [1/2]

ParseResult lloyal::chat_out::parse	(	const llama_model *	model,
		const std::string &	output,
		bool	is_partial = `false`
	)

inline

Parse model output with auto-detected format from model template.

Convenience overload that detects the format from the model's template. More expensive than the explicit-format overload since it initializes templates and applies them to detect the format.

Parameters

model	Llama model pointer
output	The raw model output text to parse
is_partial	True if output is incomplete (streaming)

Returns: ParseResult with content, reasoning_content, and tool_calls

Note: Prefer the explicit-format overload when you already have a FormatResult.

See also: lloyal::chat_in::format()

Definition at line 201 of file chat_out.hpp.

◆ parse() [2/2]

ParseResult lloyal::chat_out::parse	(	const std::string &	output,
		common_chat_format	format,
		common_reasoning_format	reasoning_format = `COMMON_REASONING_FORMAT_NONE`,
		bool	is_partial = `false`,
		const std::string &	generation_prompt = `""`,
		const std::string &	parser_data = `""`
	)

inline

Parse model output with explicit format.

Uses the format detected by chat_in::format() to apply the correct parser. For most formats, this delegates to common_chat_parse() which handles 25+ model-specific output formats (DeepSeek, Mistral, Hermes, etc.).

Parameters

output	The raw model output text to parse
format	The chat format (from chat_in::FormatResult.format)
reasoning_format	How to handle reasoning/thinking blocks
is_partial	True if output is incomplete (streaming)
generation_prompt	Generation prompt prefill text (e.g. "<think>")
parser_data	Serialized PEG parser (from chat_in::FormatResult.parser). Required for PEG format models; ignored for others.

Returns: ParseResult with content, reasoning_content, and tool_calls

Note: This function never throws. On error, returns raw output as content.

Warning: For PEG format models (COMMON_CHAT_FORMAT_PEG_*), the parser_data parameter must contain the serialized PEG parser from chat_in::FormatResult::parser. Omitting it will cause parse failures for these formats.

See also: lloyal::chat_in::format()

auto fmt = chat_in::format(model, inputs);
// ... generate tokens ...
auto parsed = chat_out::parse(output_text, fmt.format, fmt.reasoning_format,
                               false, fmt.generation_prompt, fmt.parser);
if (!parsed.tool_calls.empty()) {
  // Handle tool calls
}

Cold Restart with Thinking Models: When storing assistant messages for potential cold restart (re-formatting the full conversation from scratch), parse output to separate reasoning from content. This is important for thinking models (Qwen3, DeepSeek-R1, etc.) where the raw output contains <think>...</think> blocks — storing raw output as content would re-inject thinking tags as literal text when re-formatted.

// After generating tokens, detokenize the raw output
std::string raw_output = detokenized_text;
 
// Parse: separates reasoning from content
auto parsed = chat_out::parse(raw_output, fmt.format,
    fmt.reasoning_format, false, fmt.generation_prompt, fmt.parser);
 
// Store with separate fields for correct re-formatting on cold restart
json msg = {{"role", "assistant"}, {"content", parsed.content}};
if (!parsed.reasoning_content.empty()) {
  msg["reasoning_content"] = parsed.reasoning_content;
}
messages.push_back(msg);

Definition at line 143 of file chat_out.hpp.

Classes

Functions

Detailed Description

Function Documentation

◆ parse() [1/2]

◆ parse() [2/2]