liblloyal 1.0.0
Composable primitives for llama.cpp inference
Loading...
Searching...
No Matches
lloyal::embedding Namespace Reference

Namespaces

namespace  detail
 

Enumerations

enum class  Normalize : int32_t { None = 0 , L2 = 1 }
 Normalization modes for embedding vectors. More...
 

Functions

bool has_embeddings (const llama_model *model)
 Check if model supports embeddings.
 
int32_t dimension (const llama_model *model)
 Get embedding dimension for model.
 
bool has_pooling (llama_context *ctx)
 Check if context has pooling enabled.
 
int32_t pooling_type (llama_context *ctx)
 Get pooling type for context.
 
void encode (llama_context *ctx, const llama_token *tokens, int32_t n_tokens, int32_t n_batch)
 Encode tokens for embedding extraction.
 
void encode (llama_context *ctx, const std::vector< llama_token > &tokens, int32_t n_batch)
 Convenience overload for std::vector<llama_token>
 
std::vector< float > get (llama_context *ctx, Normalize normalize=Normalize::L2)
 Get embeddings for last decoded batch.
 
std::vector< floatget_seq (llama_context *ctx, llama_seq_id seq, Normalize normalize=Normalize::L2)
 Get embeddings for specific sequence.
 
std::vector< floatget_ith (llama_context *ctx, int32_t idx, Normalize normalize=Normalize::L2)
 Get embeddings for specific token index in last batch.
 
float cosine_similarity (const std::vector< float > &a, const std::vector< float > &b)
 Compute cosine similarity between two embedding vectors.
 

Enumeration Type Documentation

◆ Normalize

Normalization modes for embedding vectors.

Enumerator
None 
L2 
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 47 of file embedding.hpp.

Function Documentation

◆ cosine_similarity()

float lloyal::embedding::cosine_similarity ( const std::vector< float > &  a,
const std::vector< float > &  b 
)
inline

Compute cosine similarity between two embedding vectors.

Parameters
aFirst embedding vector (should be L2-normalized)
bSecond embedding vector (should be L2-normalized)
Returns
Cosine similarity in range [-1, 1]

NOTE: For normalized vectors, cosine similarity = dot product

EXAMPLE: auto emb1 = embedding::get(ctx1, Normalize::L2); auto emb2 = embedding::get(ctx2, Normalize::L2); float sim = embedding::cosine_similarity(emb1, emb2);

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 451 of file embedding.hpp.

◆ dimension()

int32_t lloyal::embedding::dimension ( const llama_model model)
inline

Get embedding dimension for model.

Parameters
modelLlama model
Returns
Embedding dimension (e.g., 384, 768, 1024, 4096)
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 79 of file embedding.hpp.

◆ encode() [1/2]

void lloyal::embedding::encode ( llama_context ctx,
const llama_token tokens,
int32_t  n_tokens,
int32_t  n_batch 
)
inline

Encode tokens for embedding extraction.

Unlike decoder::decode_tokens(), this marks ALL tokens with logits=true which is required for embedding extraction.

NOTE: Use this with a dedicated embedding context (embeddings=true, pooling enabled). Clear KV between texts with kv::clear_all():

// Create dedicated embedding context ctx_params.embeddings = true; ctx_params.pooling_type = LLAMA_POOLING_TYPE_MEAN; auto embed_ctx = llama_init_from_model(model, ctx_params);

// Embed each text kv::clear_all(embed_ctx); embedding::encode(embed_ctx, tokens, 512); auto emb = embedding::get(embed_ctx);

Parameters
ctxLlama context (must have embeddings=true and pooling enabled)
tokensToken array to encode
n_tokensNumber of tokens in array
n_batchBatch size
Exceptions
std::runtime_errorif encode fails
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 202 of file embedding.hpp.

◆ encode() [2/2]

void lloyal::embedding::encode ( llama_context ctx,
const std::vector< llama_token > &  tokens,
int32_t  n_batch 
)
inline

Convenience overload for std::vector<llama_token>

Definition at line 249 of file embedding.hpp.

◆ get()

std::vector< float > lloyal::embedding::get ( llama_context *  ctx,
Normalize  normalize = Normalize::L2 
)
inline

Get embeddings for last decoded batch.

Parameters
ctxLlama context (must have pooling enabled)
normalizeNormalization mode (default: L2 for cosine similarity)
Returns
Embedding vector (size = embedding dimension)
Exceptions
std::runtime_errorif extraction fails

REQUIRES: Previous llama_decode() call with tokens

EXAMPLE: auto tokens = tokenizer::tokenize(model, "Hello world"); decoder::decode_tokens(ctx, tokens, 0, 512); auto embedding = embedding::get(ctx, Normalize::L2);

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 271 of file embedding.hpp.

◆ get_ith()

std::vector< float > lloyal::embedding::get_ith ( llama_context ctx,
int32_t  idx,
Normalize  normalize = Normalize::L2 
)
inline

Get embeddings for specific token index in last batch.

Parameters
ctxLlama context
idxToken index in batch
normalizeNormalization mode
Returns
Embedding vector
Exceptions
std::runtime_errorif extraction fails

USE CASE: Per-token embeddings for token-level analysis, kNN-LM

NOTE: Per-token embeddings may work without pooling enabled

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 400 of file embedding.hpp.

◆ get_seq()

std::vector< float > lloyal::embedding::get_seq ( llama_context ctx,
llama_seq_id  seq,
Normalize  normalize = Normalize::L2 
)
inline

Get embeddings for specific sequence.

Parameters
ctxLlama context
seqSequence ID
normalizeNormalization mode
Returns
Embedding vector
Exceptions
std::runtime_errorif extraction fails

USE CASE: Multi-sequence embedding extraction (batch embedding different texts)

NOTE: Falls back to get() for seq=0 if sequence-specific API unavailable

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 341 of file embedding.hpp.

◆ has_embeddings()

bool lloyal::embedding::has_embeddings ( const llama_model model)
inline

Check if model supports embeddings.

Parameters
modelLlama model
Returns
true if model has non-zero embedding dimension

NOTE: This checks dimension only. For proper embeddings, the context must also be created with pooling enabled (LLAMA_POOLING_TYPE_MEAN, etc.)

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 63 of file embedding.hpp.

◆ has_pooling()

bool lloyal::embedding::has_pooling ( llama_context ctx)
inline

Check if context has pooling enabled.

Parameters
ctxLlama context
Returns
true if pooling is enabled (required for embeddings)

NOTE: Context must be created with pooling type != LLAMA_POOLING_TYPE_NONE for embeddings to work correctly.

Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 99 of file embedding.hpp.

◆ pooling_type()

int32_t lloyal::embedding::pooling_type ( llama_context ctx)
inline

Get pooling type for context.

Parameters
ctxLlama context
Returns
Pooling type enum value

Types:

  • LLAMA_POOLING_TYPE_NONE (0): No pooling
  • LLAMA_POOLING_TYPE_MEAN (1): Mean pooling (most common)
  • LLAMA_POOLING_TYPE_CLS (2): CLS token pooling
  • LLAMA_POOLING_TYPE_LAST (3): Last token pooling
Examples
/home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 120 of file embedding.hpp.