|
liblloyal 1.0.0
Composable primitives for llama.cpp inference
|
Text Tokenization Operations. More...
#include "common.hpp"#include <cstdint>#include <llama/llama.h>#include <string>#include <vector>Go to the source code of this file.
Namespaces | |
| namespace | lloyal |
| JSON Schema to Grammar Converter (Header-Only) | |
| namespace | lloyal::tokenizer |
Functions | |
| std::vector< llama_token > | lloyal::tokenizer::tokenize (const llama_vocab *vocab, const std::string &text, bool add_special, bool parse_special) |
| Tokenize text to token array. | |
| std::string | lloyal::tokenizer::detokenize (const llama_vocab *vocab, llama_token token, bool special) |
| Detokenize SINGLE token to text (streaming use case) | |
| std::string | lloyal::tokenizer::detokenize_batch (const llama_vocab *vocab, const llama_token *tokens, int32_t n_tokens, bool remove_special, bool unparse_special) |
| Detokenize TOKEN ARRAY to text (reconstruction use case) | |
| const llama_vocab * | lloyal::tokenizer::get_vocab (const llama_model *model) |
| Get vocabulary from model. | |
| bool | lloyal::tokenizer::is_eog (const llama_vocab *vocab, llama_token token) |
| Check if token is end-of-generation marker. | |
| int32_t | lloyal::tokenizer::vocab_size (const llama_vocab *vocab) |
| Get vocabulary size (total number of tokens) | |
| std::vector< llama_token > | lloyal::tokenizer::tokenize (const llama_model *model, const std::string &text) |
| Tokenize text to token array (model-accepting overload) | |
| std::string | lloyal::tokenizer::detokenize (const llama_model *model, llama_token token, bool special=true) |
| Detokenize SINGLE token to text (model-accepting overload) | |
| std::string | lloyal::tokenizer::detokenize_batch (const llama_model *model, const std::vector< llama_token > &tokens, bool remove_special=false, bool unparse_special=true) |
| Detokenize TOKEN VECTOR to text (convenience overload) | |
| std::string | lloyal::tokenizer::detokenize_batch (const llama_model *model, const llama_token *tokens, int32_t n_tokens, bool remove_special, bool unparse_special) |
| Detokenize TOKEN ARRAY to text (model-accepting overload) | |
| bool | lloyal::tokenizer::is_eog (const llama_model *model, llama_token token) |
| Check if token is end-of-generation marker (model-accepting overload) | |
| int32_t | lloyal::tokenizer::vocab_size (const llama_model *model) |
| Get vocabulary size (model-accepting overload) | |
Text Tokenization Operations.
Wraps llama.cpp tokenization APIs with safe buffer management and special token handling. Uses two-pass algorithms for reliable buffer sizing.
Architecture:
Definition in file tokenizer.hpp.