|
| bool | has_embeddings (const llama_model *model) |
| | Check if model supports embeddings.
|
| |
| int32_t | dimension (const llama_model *model) |
| | Get embedding dimension for model.
|
| |
| bool | has_pooling (llama_context *ctx) |
| | Check if context has pooling enabled.
|
| |
| int32_t | pooling_type (llama_context *ctx) |
| | Get pooling type for context.
|
| |
| void | encode (llama_context *ctx, const llama_token *tokens, int32_t n_tokens, int32_t n_batch) |
| | Encode tokens for embedding extraction.
|
| |
| void | encode (llama_context *ctx, const std::vector< llama_token > &tokens, int32_t n_batch) |
| | Convenience overload for std::vector<llama_token>
|
| |
| std::vector< float > | get (llama_context *ctx, Normalize normalize=Normalize::L2) |
| | Get embeddings for last decoded batch.
|
| |
| std::vector< float > | get_seq (llama_context *ctx, llama_seq_id seq, Normalize normalize=Normalize::L2) |
| | Get embeddings for specific sequence.
|
| |
| std::vector< float > | get_ith (llama_context *ctx, int32_t idx, Normalize normalize=Normalize::L2) |
| | Get embeddings for specific token index in last batch.
|
| |
| float | cosine_similarity (const std::vector< float > &a, const std::vector< float > &b) |
| | Compute cosine similarity between two embedding vectors.
|
| |
| float lloyal::embedding::cosine_similarity |
( |
const std::vector< float > & |
a, |
|
|
const std::vector< float > & |
b |
|
) |
| |
|
inline |
Compute cosine similarity between two embedding vectors.
- Parameters
-
| a | First embedding vector (should be L2-normalized) |
| b | Second embedding vector (should be L2-normalized) |
- Returns
- Cosine similarity in range [-1, 1]
NOTE: For normalized vectors, cosine similarity = dot product
EXAMPLE: auto emb1 = embedding::get(ctx1, Normalize::L2); auto emb2 = embedding::get(ctx2, Normalize::L2); float sim = embedding::cosine_similarity(emb1, emb2);
- Examples
- /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.
Definition at line 451 of file embedding.hpp.
| void lloyal::embedding::encode |
( |
llama_context * |
ctx, |
|
|
const llama_token * |
tokens, |
|
|
int32_t |
n_tokens, |
|
|
int32_t |
n_batch |
|
) |
| |
|
inline |
Encode tokens for embedding extraction.
Unlike decode::many(), this marks ALL tokens with logits=true which is required for embedding extraction.
NOTE: Use this with a dedicated embedding context (embeddings=true, pooling enabled). Clear KV between texts with kv::clear_all():
// Create dedicated embedding context ctx_params.embeddings = true; ctx_params.pooling_type = LLAMA_POOLING_TYPE_MEAN; auto embed_ctx = llama_init_from_model(model, ctx_params);
// Embed each text kv::clear_all(embed_ctx); embedding::encode(embed_ctx, tokens, 512); auto emb = embedding::get(embed_ctx);
- Parameters
-
| ctx | Llama context (must have embeddings=true and pooling enabled) |
| tokens | Token array to encode |
| n_tokens | Number of tokens in array |
| n_batch | Batch size |
- Exceptions
-
| std::runtime_error | if encode fails |
- Examples
- /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.
Definition at line 202 of file embedding.hpp.
Get embeddings for last decoded batch.
- Parameters
-
| ctx | Llama context (must have pooling enabled) |
| normalize | Normalization mode (default: L2 for cosine similarity) |
- Returns
- Embedding vector (size = embedding dimension)
- Exceptions
-
| std::runtime_error | if extraction fails |
REQUIRES: Previous llama_decode() call with tokens
EXAMPLE: auto tokens = tokenizer::tokenize(model, "Hello world"); decode::many(ctx, tokens, 0, 512); auto embedding = embedding::get(ctx, Normalize::L2);
- Examples
- /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.
Definition at line 271 of file embedding.hpp.