liblloyal 1.0.0
Branched Inference for llama.cpp
Loading...
Searching...
No Matches
lloyal::chat_out Namespace Reference

Chat output parsing (tool calls, reasoning, content) More...

Classes

struct  ParseResult
 Result from parsing model output. More...
 
struct  ToolCall
 A single tool call extracted from model output. More...
 

Functions

ParseResult parse (const std::string &output, common_chat_format format, common_reasoning_format reasoning_format=COMMON_REASONING_FORMAT_NONE, bool is_partial=false, bool thinking_forced_open=false, const std::string &parser_data="")
 Parse model output with explicit format.
 
ParseResult parse (const llama_model *model, const std::string &output, bool is_partial=false)
 Parse model output with auto-detected format from model template.
 

Detailed Description

Chat output parsing (tool calls, reasoning, content)

Wraps llama.cpp's common_chat_parse() to extract structured content from model output. Pairs with lloyal::chat_in – use the format from chat_in::FormatResult to select the correct parser.

Function Documentation

◆ parse() [1/2]

ParseResult lloyal::chat_out::parse ( const llama_model *  model,
const std::string &  output,
bool  is_partial = false 
)
inline

Parse model output with auto-detected format from model template.

Convenience overload that detects the format from the model's template. More expensive than the explicit-format overload since it initializes templates and applies them to detect the format.

Parameters
modelLlama model pointer
outputThe raw model output text to parse
is_partialTrue if output is incomplete (streaming)
Returns
ParseResult with content, reasoning_content, and tool_calls
Note
Prefer the explicit-format overload when you already have a FormatResult.
See also
lloyal::chat_in::format()

Definition at line 200 of file chat_out.hpp.

◆ parse() [2/2]

ParseResult lloyal::chat_out::parse ( const std::string &  output,
common_chat_format  format,
common_reasoning_format  reasoning_format = COMMON_REASONING_FORMAT_NONE,
bool  is_partial = false,
bool  thinking_forced_open = false,
const std::string &  parser_data = "" 
)
inline

Parse model output with explicit format.

Uses the format detected by chat_in::format() to apply the correct parser. For most formats, this delegates to common_chat_parse() which handles 25+ model-specific output formats (DeepSeek, Mistral, Hermes, etc.).

Parameters
outputThe raw model output text to parse
formatThe chat format (from chat_in::FormatResult.format)
reasoning_formatHow to handle reasoning/thinking blocks
is_partialTrue if output is incomplete (streaming)
thinking_forced_openWhether thinking tag was forced open
parser_dataSerialized PEG parser (from chat_in::FormatResult.parser). Required for PEG format models; ignored for others.
Returns
ParseResult with content, reasoning_content, and tool_calls
Note
This function never throws. On error, returns raw output as content.
Warning
For PEG format models (COMMON_CHAT_FORMAT_PEG_*), the parser_data parameter must contain the serialized PEG parser from chat_in::FormatResult::parser. Omitting it will cause parse failures for these formats.
See also
lloyal::chat_in::format()
auto fmt = chat_in::format(model, inputs);
// ... generate tokens ...
auto parsed = chat_out::parse(output_text, fmt.format, fmt.reasoning_format,
false, fmt.thinking_forced_open, fmt.parser);
if (!parsed.tool_calls.empty()) {
// Handle tool calls
}
FormatResult format(const llama_model *model, const FormatInputs &inputs)
Format chat messages using model's chat template with full format awareness.
Definition chat_in.hpp:135
ParseResult parse(const std::string &output, common_chat_format format, common_reasoning_format reasoning_format=COMMON_REASONING_FORMAT_NONE, bool is_partial=false, bool thinking_forced_open=false, const std::string &parser_data="")
Parse model output with explicit format.
Definition chat_out.hpp:142
Cold Restart with Thinking Models
When storing assistant messages for potential cold restart (re-formatting the full conversation from scratch), parse output to separate reasoning from content. This is important for thinking models (Qwen3, DeepSeek-R1, etc.) where the raw output contains <think>...</think> blocks — storing raw output as content would re-inject thinking tags as literal text when re-formatted.
// After generating tokens, detokenize the raw output
std::string raw_output = detokenized_text;
// Parse: separates reasoning from content
auto parsed = chat_out::parse(raw_output, fmt.format,
fmt.reasoning_format, false, fmt.thinking_forced_open, fmt.parser);
// Store with separate fields for correct re-formatting on cold restart
json msg = {{"role", "assistant"}, {"content", parsed.content}};
if (!parsed.reasoning_content.empty()) {
msg["reasoning_content"] = parsed.reasoning_content;
}
messages.push_back(msg);

Definition at line 142 of file chat_out.hpp.