Path to .gguf model file
OptionalembeddingsEnable embedding extraction mode
When true, context is optimized for embedding extraction. Use with encode() and getEmbeddings() methods. Default: false (text generation mode)
OptionalnContext size (default: 2048)
OptionalnMaximum number of sequences for multi-sequence support
Set > 1 to enable multiple independent KV cache sequences. Useful for parallel decoding or conversation branching. Default: 1 (single sequence)
OptionalnNumber of threads (default: 4)
OptionalpoolingPooling type for embedding extraction
Only relevant when embeddings=true. Default: MEAN for embedding contexts, NONE otherwise
Options for creating an inference context