Optionalopts: PressureThresholdsReadonlyhardCrash-prevention floor — agents killed when remaining drops below
ReadonlyremainingKV slots remaining (nCtx - cellsUsed).
Infinity when nCtx ≤ 0 (no context limit).
ReadonlysoftRemaining KV floor — tokens reserved for downstream work
Static ReadonlyDEFAULT_Default hardLimit: 128 tokens crash-prevention floor
Static ReadonlyDEFAULT_Default softLimit: 1024 tokens reserved for downstream work
remaining < hardLimit — agent must not call produceSync().
Tokens available for new work: remaining - softLimit.
Positive means room to accept tool results or continue generating.
Negative means over budget — SETTLE rejects, PRODUCE hard-cuts.
Immutable KV budget snapshot for one tick of the agent loop
Created from
SessionContext._storeKvPressure()which returns{ nCtx, cellsUsed, remaining }whereremaining = nCtx - cellsUsed.cellsUsedtracks unique KV cells per branch — incremented ondecode_each/decode_scatter, decremented on release byposition - fork_head(unique cells above the fork point), reset on bulk ops likeretainOnlyanddrain.Two thresholds partition
remaininginto three zones:produceSync()to prevent llama_decode crashes.