liblloyal 1.0.0
Composable primitives for llama.cpp inference
Loading...
Searching...
No Matches
lloyal Namespace Reference

JSON Schema to Grammar Converter (Header-Only) More...

Namespaces

namespace  chat_template
 
namespace  decoder
 
namespace  defaults
 
namespace  detail
 
namespace  embedding
 
namespace  grammar
 
namespace  kv
 
namespace  logits
 
namespace  metrics
 
namespace  sampler
 
namespace  tokenizer
 

Classes

struct  ChatTemplateResult
 Result from complete chat template processing. More...
 
struct  common_grammar_builder
 
struct  common_grammar_options
 
struct  ModelKey
 Model cache key combining file path and GPU configuration. More...
 
struct  ModelKeyHash
 Hash functor for ModelKey. More...
 
class  ModelRegistry
 Thread-safe registry for sharing llama_model instances. More...
 

Concepts

concept  SamplingParamsLike
 C++20 concept: Any type with sampling parameter fields.
 

Typedefs

using json = nlohmann::ordered_json
 

Functions

void batch_clear (llama_batch &batch)
 Clear batch to empty state.
 
void batch_add (llama_batch &batch, llama_token id, int32_t pos, const std::vector< llama_seq_id > &seq_ids, bool logits, int32_t capacity=-1)
 Add single token to batch with position and sequence info.
 
std::string format_chat_template_from_model (const llama_model *model, const std::string &messages_json, const std::string &template_override="")
 Format chat messages using model's built-in template.
 
std::vector< std::string > extract_template_stop_tokens (const llama_model *model, const std::string &template_str)
 Dynamically detect stop tokens from chat template.
 
ChatTemplateResult format_chat_template_complete (const llama_model *model, const std::string &messages_json, const std::string &template_override="")
 Complete chat template processing with stop token detection.
 
bool validate_chat_template_helper (const std::string &template_str)
 Validate chat template syntax.
 
const std::vector< ggml_type > & get_kv_cache_types ()
 Get list of supported KV cache types.
 
ggml_type kv_cache_type_from_str (const std::string &s)
 Convert cache type string to ggml_type enum.
 
bool is_truthy (const std::string &value)
 Check if string represents a truthy value.
 
bool is_falsey (const std::string &value)
 Check if string represents a falsey value.
 
bool is_autoy (const std::string &value)
 Check if string represents an auto value.
 
std::string string_repeat (const std::string &str, size_t n)
 
std::string string_join (const std::vector< std::string > &values, const std::string &separator)
 
std::vector< std::string > string_split (const std::string &str, const std::string &delimiter)
 
std::string json_schema_to_grammar (const json &schema, bool force_gbnf=false)
 Convert JSON schema to GBNF grammar.
 
std::string build_grammar (const std::function< void(const common_grammar_builder &)> &cb, const common_grammar_options &options={})
 Build grammar from callback.
 

Detailed Description

JSON Schema to Grammar Converter (Header-Only)

Purpose: Convert JSON schema to GBNF (Grammar BNF) format for constrained generation. Vendored from llama.cpp/common/json-schema-to-grammar.{h,cpp}

Architecture:

Typedef Documentation

◆ json

typedef nlohmann::ordered_json lloyal::json

Definition at line 50 of file helpers.hpp.

Function Documentation

◆ batch_add()

void lloyal::batch_add ( llama_batch &  batch,
llama_token  id,
int32_t  pos,
const std::vector< llama_seq_id > &  seq_ids,
bool  logits,
int32_t  capacity = -1 
)
inline

Add single token to batch with position and sequence info.

Appends a token to the batch at the current n_tokens position, then increments the counter. Assigns position embedding, sequence IDs, and logits flag.

Parameters
batchBatch to modify (appends token at batch.n_tokens)
idToken ID to add
posPosition embedding for this token (e.g., 0, 1, 2...)
seq_idsSequence IDs this token belongs to (usually single-element vector {0})
logitsWhether to compute logits for this token
capacityOptional capacity check for DEBUG builds (default: -1 disables check)
Warning
Caller must ensure batch has sufficient capacity (n_tokens < n_max) to avoid buffer overflow. No runtime bounds checking in release builds.
Note
DEBUG builds enable capacity assertion if capacity > 0
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 84 of file helpers.hpp.

◆ batch_clear()

void lloyal::batch_clear ( llama_batch &  batch)
inline

Clear batch to empty state.

Resets the batch token counter to prepare for new tokens. Does not deallocate buffer memory.

Parameters
batchBatch to clear (modified in place)
Note
Only resets n_tokens counter, buffer capacity remains unchanged
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 64 of file helpers.hpp.

◆ build_grammar()

std::string lloyal::build_grammar ( const std::function< void(const common_grammar_builder &)> &  cb,
const common_grammar_options options = {} 
)
inline

Build grammar from callback.

Parameters
cbCallback function to build grammar rules
optionsGrammar options (dotall, etc.)
Returns
Formatted GBNF grammar string

Definition at line 1281 of file json-schema-to-grammar.hpp.

◆ extract_template_stop_tokens()

std::vector< std::string > lloyal::extract_template_stop_tokens ( const llama_model *  model,
const std::string &  template_str 
)
inline

Dynamically detect stop tokens from chat template.

Analyzes template string to identify template-specific stop tokens and verifies they exist in the model's vocabulary. Prevents generating invalid tokens that would cause tokenization failures.

Supported patterns:

  • ChatML: <|im_end|>, <|endoftext|> (when template contains "im_start")
  • Llama-3: <|eom_id|>, <|eot_id|> (when template contains "eom_id" or "eot_id")
  • Fallback: Model's EOT token from vocabulary
Parameters
modelLlama model (can be null, returns empty vector)
template_strJinja2 template string to analyze
Returns
Vector of stop token strings that exist in model vocabulary
Note
Only returns tokens that successfully tokenize to single token IDs. Prevents returning strings that would split into multiple tokens.

Definition at line 204 of file helpers.hpp.

◆ format_chat_template_complete()

ChatTemplateResult lloyal::format_chat_template_complete ( const llama_model *  model,
const std::string &  messages_json,
const std::string &  template_override = "" 
)
inline

Complete chat template processing with stop token detection.

Combines format_chat_template_from_model() and extract_template_stop_tokens() into a single call for convenience. Returns both formatted prompt and detected stop tokens.

Parameters
modelLlama model (can be null, will use ChatML fallback)
messages_jsonJSON array of messages: [{"role":"user","content":"..."},...]
template_overrideOptional Jinja2 template string (default: empty, uses model template)
Returns
ChatTemplateResult with formatted prompt and additional_stops vector
Note
Equivalent to calling format_chat_template_from_model() followed by extract_template_stop_tokens(), but more efficient as it only queries model metadata once.

Definition at line 282 of file helpers.hpp.

◆ format_chat_template_from_model()

std::string lloyal::format_chat_template_from_model ( const llama_model *  model,
const std::string &  messages_json,
const std::string &  template_override = "" 
)
inline

Format chat messages using model's built-in template.

Applies chat template (Jinja2) to format message array into a single prompt string. Automatically queries model metadata for BOS/EOS tokens and add_bos/add_eos flags.

Template selection hierarchy:

  1. template_override (if provided)
  2. model's embedded template (from GGUF metadata)
  3. ChatML fallback (default)
Parameters
modelLlama model (can be null, will use ChatML fallback)
messages_jsonJSON array of messages: [{"role":"user","content":"..."},...]
template_overrideOptional Jinja2 template string (default: empty, uses model template)
Returns
Formatted prompt string ready for tokenization
Exceptions
std::exceptionif JSON parsing fails (caught internally, returns empty string)
Note
Strips BOS/EOS wrapper tokens if model metadata indicates they're added during tokenization to prevent double-token issues

Definition at line 140 of file helpers.hpp.

◆ get_kv_cache_types()

const std::vector< ggml_type > & lloyal::get_kv_cache_types ( )
inline

Get list of supported KV cache types.

Returns static vector of ggml_type enums representing supported quantization formats for KV cache. Includes full-precision (F32, F16, BF16) and quantized formats (Q8_0, Q4_0, Q4_1, IQ4_NL, Q5_0, Q5_1).

Returns
Reference to static vector of supported cache types
Note
Returns const reference to avoid allocation on each call

Definition at line 363 of file helpers.hpp.

◆ is_autoy()

bool lloyal::is_autoy ( const std::string &  value)
inline

Check if string represents an auto value.

Parameters
valueString to check
Returns
True if value is "auto" or "-1"

Definition at line 419 of file helpers.hpp.

◆ is_falsey()

bool lloyal::is_falsey ( const std::string &  value)
inline

Check if string represents a falsey value.

Parameters
valueString to check
Returns
True if value is "off", "disabled", "0", or "false"

Definition at line 408 of file helpers.hpp.

◆ is_truthy()

bool lloyal::is_truthy ( const std::string &  value)
inline

Check if string represents a truthy value.

Parameters
valueString to check
Returns
True if value is "on", "enabled", "1", or "true"

Definition at line 398 of file helpers.hpp.

◆ json_schema_to_grammar()

std::string lloyal::json_schema_to_grammar ( const json schema,
bool  force_gbnf = false 
)
inline

Convert JSON schema to GBNF grammar.

Parameters
schemaJSON schema (nlohmann::ordered_json)
force_gbnfForce GBNF output (default: false allows EBNF optimization)
Returns
GBNF grammar string
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/grammar.hpp.

Definition at line 1265 of file json-schema-to-grammar.hpp.

◆ kv_cache_type_from_str()

ggml_type lloyal::kv_cache_type_from_str ( const std::string &  s)
inline

Convert cache type string to ggml_type enum.

Maps type name string (e.g., "f16", "q4_0") to corresponding ggml_type enum. Used for parsing user-provided cache type configuration.

Parameters
sType name string (e.g., "f16", "q4_0", "q8_0")
Returns
Matching ggml_type enum value
Exceptions
std::runtime_errorif type name is not in supported types list

Definition at line 382 of file helpers.hpp.

◆ string_join()

std::string lloyal::string_join ( const std::vector< std::string > &  values,
const std::string &  separator 
)
inline

Definition at line 442 of file helpers.hpp.

◆ string_repeat()

std::string lloyal::string_repeat ( const std::string &  str,
size_t  n 
)
inline

Definition at line 426 of file helpers.hpp.

◆ string_split()

std::vector< std::string > lloyal::string_split ( const std::string &  str,
const std::string &  delimiter 
)
inline

Definition at line 455 of file helpers.hpp.

◆ validate_chat_template_helper()

bool lloyal::validate_chat_template_helper ( const std::string &  template_str)
inline

Validate chat template syntax.

Attempts to parse Jinja2 template string using minja engine to check for syntax errors before usage.

Parameters
template_strJinja2 template string to validate
Returns
True if template syntax is valid, false if parsing failed
Note
Uses empty BOS/EOS tokens for validation - only checks syntax, not semantics

Definition at line 341 of file helpers.hpp.