|
liblloyal 1.0.0
Composable primitives for llama.cpp inference
|
Namespaces | |
| namespace | detail |
Enumerations | |
| enum class | Normalize : int32_t { None = 0 , L2 = 1 } |
| Normalization modes for embedding vectors. More... | |
Functions | |
| bool | has_embeddings (const llama_model *model) |
| Check if model supports embeddings. | |
| int32_t | dimension (const llama_model *model) |
| Get embedding dimension for model. | |
| bool | has_pooling (llama_context *ctx) |
| Check if context has pooling enabled. | |
| int32_t | pooling_type (llama_context *ctx) |
| Get pooling type for context. | |
| void | encode (llama_context *ctx, const llama_token *tokens, int32_t n_tokens, int32_t n_batch) |
| Encode tokens for embedding extraction. | |
| void | encode (llama_context *ctx, const std::vector< llama_token > &tokens, int32_t n_batch) |
| Convenience overload for std::vector<llama_token> | |
| std::vector< float > | get (llama_context *ctx, Normalize normalize=Normalize::L2) |
| Get embeddings for last decoded batch. | |
| std::vector< float > | get_seq (llama_context *ctx, llama_seq_id seq, Normalize normalize=Normalize::L2) |
| Get embeddings for specific sequence. | |
| std::vector< float > | get_ith (llama_context *ctx, int32_t idx, Normalize normalize=Normalize::L2) |
| Get embeddings for specific token index in last batch. | |
| float | cosine_similarity (const std::vector< float > &a, const std::vector< float > &b) |
| Compute cosine similarity between two embedding vectors. | |
|
strong |
Normalization modes for embedding vectors.
| Enumerator | |
|---|---|
| None | |
| L2 | |
Definition at line 47 of file embedding.hpp.
|
inline |
Compute cosine similarity between two embedding vectors.
| a | First embedding vector (should be L2-normalized) |
| b | Second embedding vector (should be L2-normalized) |
NOTE: For normalized vectors, cosine similarity = dot product
EXAMPLE: auto emb1 = embedding::get(ctx1, Normalize::L2); auto emb2 = embedding::get(ctx2, Normalize::L2); float sim = embedding::cosine_similarity(emb1, emb2);
Definition at line 451 of file embedding.hpp.
|
inline |
Get embedding dimension for model.
| model | Llama model |
Definition at line 79 of file embedding.hpp.
|
inline |
Encode tokens for embedding extraction.
Unlike decoder::decode_tokens(), this marks ALL tokens with logits=true which is required for embedding extraction.
NOTE: Use this with a dedicated embedding context (embeddings=true, pooling enabled). Clear KV between texts with kv::clear_all():
// Create dedicated embedding context ctx_params.embeddings = true; ctx_params.pooling_type = LLAMA_POOLING_TYPE_MEAN; auto embed_ctx = llama_init_from_model(model, ctx_params);
// Embed each text kv::clear_all(embed_ctx); embedding::encode(embed_ctx, tokens, 512); auto emb = embedding::get(embed_ctx);
| ctx | Llama context (must have embeddings=true and pooling enabled) |
| tokens | Token array to encode |
| n_tokens | Number of tokens in array |
| n_batch | Batch size |
| std::runtime_error | if encode fails |
Definition at line 202 of file embedding.hpp.
|
inline |
Convenience overload for std::vector<llama_token>
Definition at line 249 of file embedding.hpp.
|
inline |
Get embeddings for last decoded batch.
| ctx | Llama context (must have pooling enabled) |
| normalize | Normalization mode (default: L2 for cosine similarity) |
| std::runtime_error | if extraction fails |
REQUIRES: Previous llama_decode() call with tokens
EXAMPLE: auto tokens = tokenizer::tokenize(model, "Hello world"); decoder::decode_tokens(ctx, tokens, 0, 512); auto embedding = embedding::get(ctx, Normalize::L2);
Definition at line 271 of file embedding.hpp.
|
inline |
Get embeddings for specific token index in last batch.
| ctx | Llama context |
| idx | Token index in batch |
| normalize | Normalization mode |
| std::runtime_error | if extraction fails |
USE CASE: Per-token embeddings for token-level analysis, kNN-LM
NOTE: Per-token embeddings may work without pooling enabled
Definition at line 400 of file embedding.hpp.
|
inline |
Get embeddings for specific sequence.
| ctx | Llama context |
| seq | Sequence ID |
| normalize | Normalization mode |
| std::runtime_error | if extraction fails |
USE CASE: Multi-sequence embedding extraction (batch embedding different texts)
NOTE: Falls back to get() for seq=0 if sequence-specific API unavailable
Definition at line 341 of file embedding.hpp.
|
inline |
Check if model supports embeddings.
| model | Llama model |
NOTE: This checks dimension only. For proper embeddings, the context must also be created with pooling enabled (LLAMA_POOLING_TYPE_MEAN, etc.)
Definition at line 63 of file embedding.hpp.
|
inline |
Check if context has pooling enabled.
| ctx | Llama context |
NOTE: Context must be created with pooling type != LLAMA_POOLING_TYPE_NONE for embeddings to work correctly.
Definition at line 99 of file embedding.hpp.
|
inline |
Get pooling type for context.
| ctx | Llama context |
Types:
Definition at line 120 of file embedding.hpp.