Class Branch

Forkable inference handle for covalent generation

A Branch owns everything needed for independent generation: a KV cache sequence, sampler chain, logits snapshot, and perplexity tracker.

Forking is cheap — the KV prefix is shared in memory (metadata-only operation under unified KV — no KV tensor buffers are copied), so sibling branches read from the same physical KV entries. Only tokens decoded after the fork point are exclusive to each branch.

Branches form trees, not just flat lists. Fork from root for best-of-N, fork from children for tree search/beam search, fork from a draft for speculative decoding.

The produce/commit protocol separates sampling from state advancement: produce() samples without writing to KV, letting you inspect the result before deciding to commit().

Example: Best-of-N with perplexity selection

const root = Branch.create(ctx, tokens.length, { temperature: 0.8 });
await root.prefill(tokens);

const results = [];
for (let i = 0; i < 5; i++) {
  const branch = await root.fork();
  branch.reseedSampler(1000 + i);
  const tokens = [];
  for await (const { token } of branch) tokens.push(token);
  results.push({ branch, tokens, ppl: branch.perplexity });
}

const best = results.reduce((a, b) => a.ppl < b.ppl ? a : b);
for (const r of results) { if (r !== best) await r.branch.prune(); }

Constructors

constructor

new Branch(ctx: SessionContext, handle: number): Branch
Parameters
- ctx: SessionContext
- handle: number
Returns Branch
- Defined in sdk/src/Branch.ts:47

Accessors

children

get children(): number[]
Child branch handles

Returns number[]
- Defined in sdk/src/Branch.ts:595

disposed

get disposed(): boolean
Whether this branch has been disposed

Returns boolean
- Defined in sdk/src/Branch.ts:577

forkHead

get forkHead(): number
Position at which this branch was forked from its parent (0 for root branches)

Returns number
- Defined in sdk/src/Branch.ts:589

handle

get handle(): number
Internal handle (for debugging)

Returns number
- Defined in sdk/src/Branch.ts:572

isActive

get isActive(): boolean
True if this branch holds a KV lease

Returns boolean
- Defined in sdk/src/Branch.ts:607

isLeaf

get isLeaf(): boolean
True if this branch has no children

Returns boolean
- Defined in sdk/src/Branch.ts:601

parent

get parent(): number
Parent branch handle, or null if root

Returns number
- Defined in sdk/src/Branch.ts:582

perplexity

get perplexity(): number
Branch's perplexity (exp of mean surprisal)

Returns number
- Defined in sdk/src/Branch.ts:566

position

get position(): number
Branch's current position (number of tokens decoded)

Returns number
- Defined in sdk/src/Branch.ts:560

samplingPerplexity

get samplingPerplexity(): number
Sampling-level perplexity (from filtered distribution)

Returns perplexity from the distribution actually sampled from (after top-k/p/temp/penalties). Useful for policy priors and monitoring sampler chain impact.

Compare with perplexity which is model-level (raw logits).

Returns number
- Defined in sdk/src/Branch.ts:527

Methods

[asyncIterator]

"[asyncIterator]"(): AsyncIterableIterator<{ text: string; token: number }>
Async iterator — generate tokens until EOG

Commit-before-yield semantics: every yielded token is already written to KV and accepted into the sampler. Breaking out of the loop is clean — no orphaned uncommitted tokens, perplexity reflects all yielded tokens.

For inspect-before-commit (speculative decoding, tree search), use the produce/commit protocol directly.

Returns AsyncIterableIterator<{ text: string; token: number }>
Example: Generate to completion
```
for await (const { token, text } of branch) {
  process.stdout.write(text);
}
```
Example: Generate with consumer-side bound
```
const tokens = [];
for await (const { token } of branch) {
  tokens.push(token);
  if (tokens.length >= limit) break;
}
```
- Defined in sdk/src/Branch.ts:640

accept

accept(token: number): void
Record token in the sampler's repeat/presence penalty window
Parameters
- token: number
  Token to accept
Returns void
- Defined in sdk/src/Branch.ts:173

clearLogitBias

clearLogitBias(): void
Clear all static logit biases from this branch

Returns void
- Defined in sdk/src/Branch.ts:552

clearSteer

clearSteer(): void

Clear all steer biases from this branch

Removes any dynamic logit adjustments set by steer(). Call this after each generation step if your steer constraints are computed per-step (e.g., N-gram blocking where the blocked set changes as text grows).

Returns void

Example: Per-step steer pattern

for (let i = 0; i < maxTokens; i++) {
  // Compute constraints based on current state
  const blocked = computeConstraints(generatedTokens);
  branch.steer(blocked.map(t => ({ token: t, bias: -Infinity })));

  const { token, isStop } = await branch.produce();
  if (isStop) break;

  await branch.commit(token);
  branch.clearSteer();  // Reset for next iteration
  generatedTokens.push(token);
}

commit

commit(token: number): Promise<void>
Accept and decode — update branch state, then write token to KV

Accepts the token into the sampler penalty window (for correct PPL measurement), then decodes (writing to KV cache via AsyncWorker on the libuv thread pool) and captures the resulting logits for the next produce() call. Accept-first ordering with rollback: if decode throws, sampler/grammar/metrics are restored from clones.
Parameters
- token: number
  Token to commit (from produce())
Returns Promise<void>
- Defined in sdk/src/Branch.ts:471

fork

fork(): Promise<Branch>
Fork this branch to a new sequence (async)

Async contract: local branches resolve immediately; cloud branches may perform an HTTP round-trip. Use forkSync when you know the branch is local and want zero-overhead forking.

Returns Promise<Branch>
New forked Branch
- Defined in sdk/src/Branch.ts:91

forkSync

forkSync(): Branch
Fork this branch to a new sequence (sync)

The child shares the parent's KV prefix in memory (metadata-only under unified KV, no KV buffer copy). Logits, sampler state, and perplexity tracker are cloned so the child can diverge independently. Fork from any branch — root or intermediate — to build arbitrarily deep trees.

Call reseedSampler() on each child for stochastic diversity.

Returns Branch
New forked Branch
- Defined in sdk/src/Branch.ts:107

getLogits

getLogits(): Float32Array
Get a copy of this branch's captured logits snapshot.

Returns n_vocab floats — the raw logit distribution from the last prefill() or commit() call.

Returns an independent copy of the branch's internal snapshot. The returned Float32Array is safe to hold across async boundaries and is not affected by subsequent decode operations.

Returns Float32Array
Independent copy of the logits snapshot (n_vocab elements)

Throws
If no logits have been captured yet
- Defined in sdk/src/Branch.ts:126

modelEntropy

modelEntropy(base?: "nats" | "bits"): number
Compute entropy of the branch's logits distribution

Measures model uncertainty from the branch's captured logits snapshot:
- Low entropy: Model is confident (peaked distribution)
- High entropy: Model is uncertain (flat distribution)
Operates directly on state->logits_snapshot — no JS round-trip.
Parameters
- base: "nats" | "bits" = 'nats'
  Logarithm base: "nats" (default) or "bits"
Returns number
Entropy value in specified base

COST: O(n_vocab) - must sum over all token probabilities
- Defined in sdk/src/Branch.ts:492

modelSurprisal

modelSurprisal(token: number, base?: "nats" | "bits"): number
Compute surprisal (negative log-likelihood) for a specific token

Measures how "surprising" the model finds the given token from the branch's captured logits snapshot:
- Low surprisal: Model expected this token (high probability)
- High surprisal: Model didn't expect this token (low probability)
Operates directly on state->logits_snapshot — no JS round-trip.
Parameters
- token: number
  Token ID to compute surprisal for
- base: "nats" | "bits" = 'nats'
  Logarithm base: "nats" (default) or "bits"
Returns number
Surprisal value in specified base

COST: O(n_vocab) - softmax normalization required
- Defined in sdk/src/Branch.ts:513

prefill

prefill(tokens: number[]): Promise<void>
Bulk-decode tokens into the branch's KV cache and capture logits.

tokens.length is the total count to process; the branch's nBatch (set at Branch.create) controls how many are sent per llama_decode call. E.g. 500 tokens with nBatch=64 → 8 calls (7×64 + 1×52).

Advances position by tokens.length. Stores final logits into the branch's internal snapshot — the next produce()/sample() reads from it.

Does NOT accept tokens into the repeat-penalty window — for external tokens (user input between turns), not model-generated tokens. For model output, use commit() which does accept + decode.

The primary way to feed tokens into a branch's KV cache.
Parameters
- tokens: number[]
  Token IDs to decode
Returns Promise<void>
- Defined in sdk/src/Branch.ts:150

produce

produce(): Promise<Produced>
Sample next token without advancing state (async)

Async contract: local branches resolve immediately; cloud branches may perform an HTTP round-trip. Use produceSync when you know the branch is local and want zero-overhead sampling.

Returns Promise<Produced>
- Defined in sdk/src/Branch.ts:440

produceSync

produceSync(): Produced
Sample next token without advancing state (sync)

Same as produce but synchronous. Use when you know the branch is local and want to avoid the microtick overhead of a promise.

Returns Produced
- Defined in sdk/src/Branch.ts:450

prune

prune(): Promise<void>
Discard this branch (async)

Async contract: local branches resolve immediately; cloud branches may perform an HTTP round-trip. Use pruneSync when you know the branch is local.

RESTRICT mode: throws if children exist. Use pruneSubtree to cascade-delete an entire subtree.

Returns Promise<void>
- Defined in sdk/src/Branch.ts:188

pruneSubtree

pruneSubtree(): Promise<void>
Discard this branch and all its descendants (async)

Async contract: local branches resolve immediately; cloud branches may perform an HTTP round-trip. Use pruneSubtreeSync when you know the branch is local.

Returns Promise<void>
- Defined in sdk/src/Branch.ts:222

pruneSubtreeSync

pruneSubtreeSync(): void
Discard this branch and all its descendants — CASCADE delete (sync)

Iterative post-order traversal: prunes children first, then this branch. Use when tearing down an entire subtree (e.g. abandoned search path). Sets disposed synchronously.

Returns void
- Defined in sdk/src/Branch.ts:233

pruneSync

pruneSync(): void
Discard this branch — remove its divergent KV entries and free the handle (sync)

Only removes KV entries divergent from the shared prefix; sibling branches are unaffected. The disposed flag is set synchronously — any call to produce(), commit(), etc. after prune() will throw immediately.

RESTRICT mode: throws if children exist. Use pruneSubtreeSync to cascade-delete an entire subtree.

Returns void
- Defined in sdk/src/Branch.ts:202

reseedSampler

reseedSampler(seed: number): void
Reseed the sampler's PRNG for diversity after fork()

CRITICAL for parallel generation: Without reseeding, all forked branches produce identical outputs because they share the same PRNG state.

Only affects stochastic samplers (temperature > 0). Greedy samplers are unchanged.
Parameters
- seed: number
  New seed for the PRNG
Returns void
- Defined in sdk/src/Branch.ts:249

sample

sample(): number
Sample next token from branch's logits snapshot

Applies the branch's full sampler chain (top-k, top-p, temperature, repeat/presence penalties) to the captured logits.

Returns number
Sampled token ID
- Defined in sdk/src/Branch.ts:163

setGrammar

setGrammar(grammarStr?: string): void
Replace or remove the grammar constraint

Pass a GBNF grammar string to constrain generation. Pass empty string or undefined to remove the constraint. The grammar state is cloned on fork(), so sibling branches can diverge independently after hot-swap.
Parameters
- OptionalgrammarStr: string
  GBNF grammar string, or empty/undefined to remove
Returns void
Example: Hot-swap grammar mid-generation
```
// Start unconstrained, then switch to JSON after detecting tool call
branch.setGrammar(jsonGrammar);
const { token } = await branch.produce();
```
- Defined in sdk/src/Branch.ts:388

setGrammarLazy

setGrammarLazy(grammar: string, triggers: GrammarTrigger[]): void
Set lazy grammar — unconstrained until trigger, then grammar-constrained

Generation runs freely until a trigger pattern or token fires, at which point the grammar activates and constrains subsequent tokens. Used for tool-call generation: model writes freely until <tool_call>, then grammar forces valid XML structure.

The grammar state is cloned on fork(), so sibling branches can diverge independently. Call again after a tool result prefill to reset.
Parameters
- grammar: string
  GBNF grammar string
- triggers: GrammarTrigger[]
  Trigger conditions from formatChat().grammarTriggers
Returns void
- Defined in sdk/src/Branch.ts:407

setLogitBias

setLogitBias(biases: { bias: number; token: number }[]): void
Set static logit biases on this branch

Unlike steer (which is NOT inherited on fork), logit biases ARE cloned when forking. Use for persistent constraints that should propagate to child branches.

Applied during sample() in order: Grammar -> Logit Bias -> Steer -> Sampler Chain
Parameters
- biases: { bias: number; token: number }[]
  Array of token adjustments. Use -Infinity to block, positive to boost, negative to reduce.
Returns void
- Defined in sdk/src/Branch.ts:544

setSamplerParams

setSamplerParams(params: SamplingParams): void
Replace the sampler chain with new parameters (memoized)

If the new params match the current chain's params, this is a no-op. Otherwise the old chain is freed and a new one is created. Use for Entropy-Driven Temperature (EDT) and other adaptive sampling strategies that adjust parameters per-step.
Parameters
- params: SamplingParams
  New sampling parameters
Returns void
Example: Entropy-Driven Temperature
```
const entropy = branch.modelEntropy('nats');
branch.setSamplerParams({ temperature: edtTemperature(entropy) });
const { token } = await branch.produce();
await branch.commit(token);
```
- Defined in sdk/src/Branch.ts:367

steer

steer(biases: { bias: number; token: number }[]): void

Apply dynamic logit adjustments for this branch only

Unlike logit_bias in sampling params (which is cloned on fork), steer biases are NOT inherited by child branches. Each branch manages its own steer state independently. This makes steer ideal for path-dependent constraints.

Use cases:

tsampler: Block tokens that would create repeated N-grams based on this branch's specific generation history
Diverse beam search: Penalize tokens already chosen by sibling beams to encourage output diversity across the beam
Dynamic constraints: Apply token restrictions that change per-step

Sampling order: Grammar → Logit Bias → Steer → Sampler Chain

Parameters

biases: { bias: number; token: number }[]
Array of token adjustments. Use -Infinity to completely block a token, positive values to boost probability, negative to reduce.

Returns void

Example: Block tokens for N-gram deduplication (tsampler pattern)

// Compute which tokens would create repeated 4-grams
const blocked = computeNgramBlocks(generatedTokens, n=4);

// Block those tokens for this sample only
branch.steer(blocked.map(t => ({ token: t, bias: -Infinity })));

const { token } = await branch.produce();  // Blocked tokens won't be sampled
await branch.commit(token);

// Clear for next iteration (recompute based on new history)
branch.clearSteer();

Example: Diverse beam search

// Each beam penalizes tokens chosen by siblings this step
for (const beam of beams) {
  // Collect tokens chosen by other beams
  const siblingTokens = beams
    .filter(b => b !== beam && b.lastToken !== undefined)
    .map(b => b.lastToken);

  // Penalize sibling choices to encourage diversity
  beam.branch.steer(siblingTokens.map(t => ({ token: t, bias: -2.0 })));

  const { token } = await beam.branch.produce();
  await beam.branch.commit(token);
  beam.lastToken = token;
  beam.branch.clearSteer();
}

Example: Boost specific tokens

// Boost "yes" and "no" tokens for a yes/no question
branch.steer([
  { token: yesTokenId, bias: 5.0 },
  { token: noTokenId, bias: 5.0 }
]);

`Static`create

create(
    ctx: SessionContext,
    position: number,
    params?: SamplingParams,
    nBatch?: number,
    grammar?: string,
): Branch
Create a root branch at the given position

The branch takes ownership of the sequence and creates its own sampler chain from the provided params. Call prefill() to decode prompt tokens and capture the logit distribution before forking.
Parameters
- ctx: SessionContext
  SessionContext to create branch on
- position: number
  Starting position (typically prompt token count)
- Optionalparams: SamplingParams
  Sampling parameters (temperature, topP, etc.)
- OptionalnBatch: number
  Per-branch batch size override (defaults to context nBatch). Controls chunk size for prefill(). Has no effect on single-token commit() which uses a zero-allocation fast path.
- Optionalgrammar: string
  GBNF grammar string for constrained generation. When provided, sample() returns only grammar-valid tokens. The grammar state is cloned on fork(), so sibling branches can diverge independently.
Returns Branch
New Branch instance
- Defined in sdk/src/Branch.ts:71

Class Branch

Example: Best-of-N with perplexity selection

Index

Constructors

Accessors

Methods

Constructors

constructor

Parameters

Returns Branch

Accessors

children

Returns number[]

disposed

Returns boolean

forkHead

Returns number

handle

Returns number

isActive

Returns boolean

isLeaf

Returns boolean

parent

Returns number

perplexity

Returns number

position

Returns number

samplingPerplexity

Returns number

Methods

[asyncIterator]

Returns AsyncIterableIterator<{ text: string; token: number }>

Example: Generate to completion

Example: Generate with consumer-side bound

accept

Parameters

Returns void

clearLogitBias

Returns void

clearSteer

Returns void

Example: Per-step steer pattern

commit

Parameters

Returns Promise<void>

fork

Returns Promise<Branch>

forkSync

Returns Branch

getLogits

Returns Float32Array

Throws

modelEntropy

Parameters

Returns number

modelSurprisal

Parameters

Returns number

prefill

Parameters

Returns Promise<void>

produce

Returns Promise<Produced>

produceSync

Returns Produced

prune

Returns Promise<void>

pruneSubtree

Returns Promise<void>

pruneSubtreeSync

Returns void

pruneSync

Returns void

reseedSampler

Parameters

Returns void

sample

Returns number

`Static`create