Namespaces
namespace	detail

Enumerations
enum class	Normalize : int32_t { None = 0 , L2 = 1 }
	Normalization modes for embedding vectors. More...

Functions
bool	has_embeddings (const llama_model *model)
	Check if model supports embeddings.

int32_t	dimension (const llama_model *model)
	Get embedding dimension for model.

bool	has_pooling (llama_context *ctx)
	Check if context has pooling enabled.

int32_t	pooling_type (llama_context *ctx)
	Get pooling type for context.

void	encode (llama_context ctx, const llama_token tokens, int32_t n_tokens, int32_t n_batch)
	Encode tokens for embedding extraction.

void	encode (llama_context *ctx, const std::vector< llama_token > &tokens, int32_t n_batch)
	Convenience overload for std::vector<llama_token>

std::vector< float >	get (llama_context *ctx, Normalize normalize=Normalize::L2)
	Get embeddings for last decoded batch.

std::vector< float >	get_seq (llama_context *ctx, llama_seq_id seq, Normalize normalize=Normalize::L2)
	Get embeddings for specific sequence.

std::vector< float >	get_ith (llama_context *ctx, int32_t idx, Normalize normalize=Normalize::L2)
	Get embeddings for specific token index in last batch.

float	cosine_similarity (const std::vector< float > &a, const std::vector< float > &b)
	Compute cosine similarity between two embedding vectors.

Enumeration Type Documentation

◆ Normalize

enum class lloyal::embedding::Normalize : int32_t

strong

Normalization modes for embedding vectors.

Enumerator
None
L2

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 47 of file embedding.hpp.

Function Documentation

◆ cosine_similarity()

float lloyal::embedding::cosine_similarity	(	const std::vector< float > &	a,
		const std::vector< float > &	b
	)

inline

Compute cosine similarity between two embedding vectors.

Parameters

a	First embedding vector (should be L2-normalized)
b	Second embedding vector (should be L2-normalized)

Returns: Cosine similarity in range [-1, 1]

NOTE: For normalized vectors, cosine similarity = dot product

EXAMPLE: auto emb1 = embedding::get(ctx1, Normalize::L2); auto emb2 = embedding::get(ctx2, Normalize::L2); float sim = embedding::cosine_similarity(emb1, emb2);

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 451 of file embedding.hpp.

◆ dimension()

int32_t lloyal::embedding::dimension ( const llama_model * model )

inline

Get embedding dimension for model.

Parameters

model Llama model

Returns: Embedding dimension (e.g., 384, 768, 1024, 4096)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 79 of file embedding.hpp.

◆ encode() [1/2]

void lloyal::embedding::encode	(	llama_context *	ctx,
		const llama_token *	tokens,
		int32_t	n_tokens,
		int32_t	n_batch
	)

inline

Encode tokens for embedding extraction.

Unlike decode::many(), this marks ALL tokens with logits=true which is required for embedding extraction.

NOTE: Use this with a dedicated embedding context (embeddings=true, pooling enabled). Clear KV between texts with kv::clear_all():

// Create dedicated embedding context ctx_params.embeddings = true; ctx_params.pooling_type = LLAMA_POOLING_TYPE_MEAN; auto embed_ctx = llama_init_from_model(model, ctx_params);

// Embed each text kv::clear_all(embed_ctx); embedding::encode(embed_ctx, tokens, 512); auto emb = embedding::get(embed_ctx);

Parameters

ctx	Llama context (must have embeddings=true and pooling enabled)
tokens	Token array to encode
n_tokens	Number of tokens in array
n_batch	Batch size

Exceptions

std::runtime_error if encode fails

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 202 of file embedding.hpp.

◆ encode() [2/2]

void lloyal::embedding::encode	(	llama_context *	ctx,
		const std::vector< llama_token > &	tokens,
		int32_t	n_batch
	)

inline

Convenience overload for std::vector<llama_token>

Definition at line 249 of file embedding.hpp.

◆ get()

std::vector< float > lloyal::embedding::get	(	llama_context *	ctx,
		Normalize	normalize = `Normalize::L2`
	)

inline

Get embeddings for last decoded batch.

Parameters

ctx	Llama context (must have pooling enabled)
normalize	Normalization mode (default: L2 for cosine similarity)

Returns: Embedding vector (size = embedding dimension)

Exceptions

std::runtime_error if extraction fails

REQUIRES: Previous llama_decode() call with tokens

EXAMPLE: auto tokens = tokenizer::tokenize(model, "Hello world"); decode::many(ctx, tokens, 0, 512); auto embedding = embedding::get(ctx, Normalize::L2);

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 271 of file embedding.hpp.

◆ get_ith()

std::vector< float > lloyal::embedding::get_ith	(	llama_context *	ctx,
		int32_t	idx,
		Normalize	normalize = `Normalize::L2`
	)

inline

Get embeddings for specific token index in last batch.

Parameters

ctx	Llama context
idx	Token index in batch
normalize	Normalization mode

Returns: Embedding vector

Exceptions

std::runtime_error if extraction fails

USE CASE: Per-token embeddings for token-level analysis, kNN-LM

NOTE: Per-token embeddings may work without pooling enabled

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 400 of file embedding.hpp.

◆ get_seq()

std::vector< float > lloyal::embedding::get_seq	(	llama_context *	ctx,
		llama_seq_id	seq,
		Normalize	normalize = `Normalize::L2`
	)

inline

Get embeddings for specific sequence.

Parameters

ctx	Llama context
seq	Sequence ID
normalize	Normalization mode

Returns: Embedding vector

Exceptions

std::runtime_error if extraction fails

USE CASE: Multi-sequence embedding extraction (batch embedding different texts)

NOTE: Falls back to get() for seq=0 if sequence-specific API unavailable

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 341 of file embedding.hpp.

◆ has_embeddings()

bool lloyal::embedding::has_embeddings ( const llama_model * model )

inline

Check if model supports embeddings.

Parameters

model Llama model

Returns: true if model has non-zero embedding dimension

NOTE: This checks dimension only. For proper embeddings, the context must also be created with pooling enabled (LLAMA_POOLING_TYPE_MEAN, etc.)

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 63 of file embedding.hpp.

◆ has_pooling()

bool lloyal::embedding::has_pooling ( llama_context * ctx )

inline

Check if context has pooling enabled.

Parameters

ctx	Llama context

Returns: true if pooling is enabled (required for embeddings)

NOTE: Context must be created with pooling type != LLAMA_POOLING_TYPE_NONE for embeddings to work correctly.

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 99 of file embedding.hpp.

◆ pooling_type()

int32_t lloyal::embedding::pooling_type ( llama_context * ctx )

inline

Get pooling type for context.

Parameters

ctx	Llama context

Returns: Pooling type enum value

Types:

LLAMA_POOLING_TYPE_NONE (0): No pooling
LLAMA_POOLING_TYPE_MEAN (1): Mean pooling (most common)
LLAMA_POOLING_TYPE_CLS (2): CLS token pooling
LLAMA_POOLING_TYPE_LAST (3): Last token pooling

Examples: /home/runner/work/liblloyal/liblloyal/include/lloyal/embedding.hpp.

Definition at line 120 of file embedding.hpp.

Namespaces

Enumerations

Functions

Enumeration Type Documentation

◆ Normalize

Function Documentation

◆ cosine_similarity()

◆ dimension()

◆ encode() [1/2]

◆ encode() [2/2]

◆ get()

◆ get_ith()

◆ get_seq()

◆ has_embeddings()

◆ has_pooling()

◆ pooling_type()