Type Alias KvCacheType

KvCacheType:
    | "f32"
    | "f16"
    | "bf16"
    | "q8_0"
    | "q4_0"
    | "q4_1"
    | "iq4_nl"
    | "q5_0"
    | "q5_1"

Supported KV cache quantization types

Matches llama.cpp CLI -ctk / -ctv flags. Lower precision = less GPU memory, slight quality tradeoff.