Supported KV cache quantization types
Matches llama.cpp CLI -ctk / -ctv flags. Lower precision = less GPU memory, slight quality tradeoff.
-ctk
-ctv
Supported KV cache quantization types
Matches llama.cpp CLI
-ctk/-ctvflags. Lower precision = less GPU memory, slight quality tradeoff.