|
liblloyal 1.0.0
Composable primitives for llama.cpp inference
|
JSON Schema to Grammar Converter (Header-Only) More...
Namespaces | |
| namespace | chat_template |
| namespace | decoder |
| namespace | defaults |
| namespace | detail |
| namespace | embedding |
| namespace | grammar |
| namespace | kv |
| namespace | logits |
| namespace | metrics |
| namespace | sampler |
| namespace | tokenizer |
Classes | |
| struct | ChatTemplateResult |
| Result from complete chat template processing. More... | |
| struct | common_grammar_builder |
| struct | common_grammar_options |
| struct | ModelKey |
| Model cache key combining file path and GPU configuration. More... | |
| struct | ModelKeyHash |
| Hash functor for ModelKey. More... | |
| class | ModelRegistry |
| Thread-safe registry for sharing llama_model instances. More... | |
Concepts | |
| concept | SamplingParamsLike |
| C++20 concept: Any type with sampling parameter fields. | |
Typedefs | |
| using | json = nlohmann::ordered_json |
Functions | |
| void | batch_clear (llama_batch &batch) |
| Clear batch to empty state. | |
| void | batch_add (llama_batch &batch, llama_token id, int32_t pos, const std::vector< llama_seq_id > &seq_ids, bool logits, int32_t capacity=-1) |
| Add single token to batch with position and sequence info. | |
| std::string | format_chat_template_from_model (const llama_model *model, const std::string &messages_json, const std::string &template_override="") |
| Format chat messages using model's built-in template. | |
| std::vector< std::string > | extract_template_stop_tokens (const llama_model *model, const std::string &template_str) |
| Dynamically detect stop tokens from chat template. | |
| ChatTemplateResult | format_chat_template_complete (const llama_model *model, const std::string &messages_json, const std::string &template_override="") |
| Complete chat template processing with stop token detection. | |
| bool | validate_chat_template_helper (const std::string &template_str) |
| Validate chat template syntax. | |
| const std::vector< ggml_type > & | get_kv_cache_types () |
| Get list of supported KV cache types. | |
| ggml_type | kv_cache_type_from_str (const std::string &s) |
| Convert cache type string to ggml_type enum. | |
| bool | is_truthy (const std::string &value) |
| Check if string represents a truthy value. | |
| bool | is_falsey (const std::string &value) |
| Check if string represents a falsey value. | |
| bool | is_autoy (const std::string &value) |
| Check if string represents an auto value. | |
| std::string | string_repeat (const std::string &str, size_t n) |
| std::string | string_join (const std::vector< std::string > &values, const std::string &separator) |
| std::vector< std::string > | string_split (const std::string &str, const std::string &delimiter) |
| std::string | json_schema_to_grammar (const json &schema, bool force_gbnf=false) |
| Convert JSON schema to GBNF grammar. | |
| std::string | build_grammar (const std::function< void(const common_grammar_builder &)> &cb, const common_grammar_options &options={}) |
| Build grammar from callback. | |
JSON Schema to Grammar Converter (Header-Only)
Purpose: Convert JSON schema to GBNF (Grammar BNF) format for constrained generation. Vendored from llama.cpp/common/json-schema-to-grammar.{h,cpp}
Architecture:
| typedef nlohmann::ordered_json lloyal::json |
Definition at line 50 of file helpers.hpp.
|
inline |
Add single token to batch with position and sequence info.
Appends a token to the batch at the current n_tokens position, then increments the counter. Assigns position embedding, sequence IDs, and logits flag.
| batch | Batch to modify (appends token at batch.n_tokens) |
| id | Token ID to add |
| pos | Position embedding for this token (e.g., 0, 1, 2...) |
| seq_ids | Sequence IDs this token belongs to (usually single-element vector {0}) |
| logits | Whether to compute logits for this token |
| capacity | Optional capacity check for DEBUG builds (default: -1 disables check) |
Definition at line 84 of file helpers.hpp.
|
inline |
Clear batch to empty state.
Resets the batch token counter to prepare for new tokens. Does not deallocate buffer memory.
| batch | Batch to clear (modified in place) |
Definition at line 64 of file helpers.hpp.
|
inline |
Build grammar from callback.
| cb | Callback function to build grammar rules |
| options | Grammar options (dotall, etc.) |
Definition at line 1281 of file json-schema-to-grammar.hpp.
|
inline |
Dynamically detect stop tokens from chat template.
Analyzes template string to identify template-specific stop tokens and verifies they exist in the model's vocabulary. Prevents generating invalid tokens that would cause tokenization failures.
Supported patterns:
| model | Llama model (can be null, returns empty vector) |
| template_str | Jinja2 template string to analyze |
Definition at line 204 of file helpers.hpp.
|
inline |
Complete chat template processing with stop token detection.
Combines format_chat_template_from_model() and extract_template_stop_tokens() into a single call for convenience. Returns both formatted prompt and detected stop tokens.
| model | Llama model (can be null, will use ChatML fallback) |
| messages_json | JSON array of messages: [{"role":"user","content":"..."},...] |
| template_override | Optional Jinja2 template string (default: empty, uses model template) |
Definition at line 282 of file helpers.hpp.
|
inline |
Format chat messages using model's built-in template.
Applies chat template (Jinja2) to format message array into a single prompt string. Automatically queries model metadata for BOS/EOS tokens and add_bos/add_eos flags.
Template selection hierarchy:
| model | Llama model (can be null, will use ChatML fallback) |
| messages_json | JSON array of messages: [{"role":"user","content":"..."},...] |
| template_override | Optional Jinja2 template string (default: empty, uses model template) |
| std::exception | if JSON parsing fails (caught internally, returns empty string) |
Definition at line 140 of file helpers.hpp.
|
inline |
Get list of supported KV cache types.
Returns static vector of ggml_type enums representing supported quantization formats for KV cache. Includes full-precision (F32, F16, BF16) and quantized formats (Q8_0, Q4_0, Q4_1, IQ4_NL, Q5_0, Q5_1).
Definition at line 363 of file helpers.hpp.
|
inline |
Check if string represents an auto value.
| value | String to check |
Definition at line 419 of file helpers.hpp.
|
inline |
Check if string represents a falsey value.
| value | String to check |
Definition at line 408 of file helpers.hpp.
|
inline |
Check if string represents a truthy value.
| value | String to check |
Definition at line 398 of file helpers.hpp.
|
inline |
Convert JSON schema to GBNF grammar.
| schema | JSON schema (nlohmann::ordered_json) |
| force_gbnf | Force GBNF output (default: false allows EBNF optimization) |
Definition at line 1265 of file json-schema-to-grammar.hpp.
|
inline |
Convert cache type string to ggml_type enum.
Maps type name string (e.g., "f16", "q4_0") to corresponding ggml_type enum. Used for parsing user-provided cache type configuration.
| s | Type name string (e.g., "f16", "q4_0", "q8_0") |
| std::runtime_error | if type name is not in supported types list |
Definition at line 382 of file helpers.hpp.
|
inline |
Definition at line 442 of file helpers.hpp.
|
inline |
Definition at line 426 of file helpers.hpp.
|
inline |
Definition at line 455 of file helpers.hpp.
|
inline |
Validate chat template syntax.
Attempts to parse Jinja2 template string using minja engine to check for syntax errors before usage.
| template_str | Jinja2 template string to validate |
Definition at line 341 of file helpers.hpp.