lloyal.node API Reference - v1.0.7
    Preparing search index...

    lloyal.node API Reference - v1.0.7

    lloyal.node

    Covalent inference for Node.js

    Forkable inference state for llama.cpp — Branch a generation into a tree — prefix sharing is the bond between branches while each owns its own machinery (sampler chain, seed, grammar, logits snapshot, perplexity tracker) enabling controlled divergence at decode time.

    Fork from root for best-of-N, fork from children for MCTS/beam search, fork from a draft for speculative decoding. The produce/commit protocol separates sampling from state advancement — sample without writing to KV, inspect the result, then decide whether to commit.

    import { createContext, Branch } from '@lloyal-labs/lloyal.node';

    const ctx = await createContext({ modelPath: './model.gguf', nSeqMax: 8 });
    const tokens = await ctx.tokenize('Once upon a time');
    await ctx.decode(tokens, 0, 0);

    // Create root branch, freeze logits from prefill
    const root = Branch.create(ctx, 0, tokens.length, { temperature: 0.8 });
    root.captureLogits();

    // Fork N candidates — KV prefix shared, sampler/grammar/logits/perplexity cloned
    const candidates = [1, 2, 3, 4, 5].map((seqId, i) => {
    const branch = root.fork(seqId);
    branch.reseedSampler(1000 + i);
    return branch;
    });

    // Generate (interleaved round-robin)
    for (let t = 0; t < 50; t++) {
    for (const branch of candidates) {
    const { token, isStop } = branch.produce(); // Sample, no KV write
    if (isStop) continue;
    branch.commit(token); // Accept + forward pass + capture
    }
    }

    // Select by perplexity, prune losers
    const best = candidates.reduce((a, b) => (a.perplexity < b.perplexity ? a : b));
    for (const c of candidates) {
    if (c !== best) c.prune();
    }

    What fork() shares: KV cache prefix (metadata-only under unified KV — no tensor buffers copied).

    What fork() clones: Logits snapshot, sampler chain (penalties + PRNG), grammar state, logit bias, perplexity tracker.

    Key methods:

    • produce() / commit() — two-phase: sample without KV write, then advance
    • prune() — discard loser and its divergent KV entries
    • destroy() — release handle, keep KV (for winners continuing with raw ops)
    • reseedSampler() — unique PRNG per fork for stochastic diversity
    • perplexity — rolling PPL per branch for quality-based selection

    npm install @lloyal-labs/lloyal.node
    

    Prebuilt binaries for 13 platform/GPU combinations. GPU selection at runtime, not install time.

    Platform Arch Acceleration
    macOS arm64 Metal
    macOS x64 CPU
    Linux x64 CPU / CUDA / Vulkan
    Linux arm64 CPU / CUDA / Vulkan
    Windows x64 CPU / CUDA / Vulkan
    Windows arm64 CPU / Vulkan

    See distribution.md for details.


    Example Pattern
    best-of-n/ Branch API: fork, produce/commit, perplexity selection
    speculative/ Branch API: draft/verify, fork/prune, bonus token sampling
    streaming/ Infinite context via BlinkKV reseeding with sidecar summarization
    entropy/ modelEntropy() mid-generation as control signal
    grammar/ Pull loop with generators, JSON schema constraints, KV + grammar branching
    chat/ Interactive streaming chat
    embed/ Text embeddings extraction
    node examples/best-of-n/best-of-n.mjs
    node examples/speculative/speculative.mjs

    Each example has a README explaining the pattern.


    Model uncertainty mid-generation enables dynamic behavior:

    const entropy = ctx.modelEntropy('bits');

    if (entropy > 4.0) {
    // High uncertainty — model is guessing
    // Trigger retrieval, reduce temperature, or branch
    }

    See examples/entropy/ for entropy-triggered sampling strategies.

    For fine-grained control without Branch:

    Approach Method Use Case
    Sequence copy kvSeqCopy(src, dst) Share prefix across sequences
    Snapshot/restore kvCacheSave() / kvCacheLoad() Sequential exploration, return to checkpoint
    const grammar = ctx.jsonSchemaToGrammar(schema);
    const handle = ctx.createSampler(grammar);
    // Pull loop — consumer controls pace, can branch at any point

    See examples/grammar/ for the full pull loop pattern.


    Full API documentation: lloyal-ai.github.io/lloyal.node

    Generated from lib/index.d.ts with TypeDoc.


    Package Runtime Description
    liblloyal C++ Header-only inference kernel
    lloyal.node Node.js This package
    nitro-llama React Native Mobile bindings via Nitro Modules
    tsampler TypeScript Reference sampler implementation

    See CONTRIBUTING.md for development setup and release process.

    Apache 2.0 — See LICENSE for details.