UAIX.LmRuntime / Package guide

UAIX.LmRuntime.Sampling

Greedy and probability sampling, logit processing, deterministic state, and stop handling.

Required For token selection and stop handling

UAIX.LmRuntime.Sampling

Greedy and probability sampling, logit transforms, deterministic random state, top-k/top-p filtering, stop matching, and generation control.

Overview

Samplers and logits processors for pure C# local LLM runtime generation.

Who should use it Generation loops, model runtimes, and tests that need explicit token selection and termination behavior.
Execution status Managed greedy and probability sampling, logit transforms, deterministic state, and stop matching are represented in the supplied source.

Install

.NET CLI
dotnet add package UAIX.LmRuntime.Sampling
Project file
<PackageReference Include="UAIX.LmRuntime.Sampling" />

Version policy: The documentation deliberately omits UAIX.LmRuntime package version numbers. Resolve and pin versions through your normal dependency-management and lock-file process.

Direct package dependencies
UAIX.LmRuntime.Abstractions Guide NuGet ↗

Package role and boundaries

Required For token selection and stop handling

  • You need deterministic greedy argmax selection.
  • You need seeded temperature, top-k, top-p, minimum-p, repetition, frequency, presence, or logit-bias processing.
  • You need byte-safe stop-sequence matching or a bounded generation controller.

Boundary

  • Computing logits or decoding model weights.
  • Treating a seed as sufficient reproducibility when model execution, tokenizer behavior, or hardware math differs.

Selection is separate from logits

The package receives a vocabulary-sized logit span. Model execution remains outside the sampling layer.

State is explicit

SamplingState owns deterministic random state and token-frequency history, so callers control whether state persists across generation steps.

Stops are byte-aware

StopSequenceMatcher retains partial UTF-8 byte prefixes across appended chunks and distinguishes visible output from matched stop bytes.

Key types

These are the main public entry points. The generated reference below includes the documented public package surface.

Coding examples

Examples use the documented public package surface. Paths, identities, runtime identifiers, device evidence, and application policy remain host inputs.

Select the greedy token

Choose the highest logit without allocating a probability distribution.

GreedySamplingExample.cs
using UAIX.LmRuntime.Sampling;

ReadOnlySpan<float> logits = [-1.2f, 0.5f, 2.75f, 1.1f];
int tokenId = GreedySampler.Select(logits);

Console.WriteLine($"Selected token: {tokenId}");

Run reproducible probability sampling

Create SamplingState once per generation sequence so random and repetition state advance together.

ProbabilitySamplingExample.cs
using UAIX.LmRuntime.Sampling;

public static class ReproducibleSamplingExample
{
    /// <summary>
    /// Selects one token from logits using an explicitly seeded sampling state.
    /// </summary>
    /// <param name="logits">The vocabulary logits for the current decoding step.</param>
    /// <returns>The selected token and candidate evidence.</returns>
    public static SamplingDecision Select(ReadOnlySpan<float> logits)
    {
        if (logits.IsEmpty)
        {
            throw new ArgumentException("At least one logit is required.", nameof(logits));
        }

        var options = new SamplingOptions
        {
            Temperature = 0.8f,
            TopK = Math.Min(40, logits.Length),
            TopP = 0.95f,
            MinimumP = 0.05f,
            RepetitionPenalty = 1.1f,
            FrequencyPenalty = 0,
            PresencePenalty = 0,
            MinimumGeneratedTokens = 0,
            MaximumGeneratedTokens = 128,
            MaximumContextTokens = 8_192,
            Seed = 42
        };

        options.Validate(vocabularySize: logits.Length);

        var state = new SamplingState(options);
        SamplingDecision decision =
            ProbabilitySampler.Select(logits, options, state);

        state.RecordToken(decision.TokenId);
        return decision;
    }
}

Boundary: Reproducibility also requires the same logits, tokenizer, settings, token history, and runtime math path.

Apply logit bias and history penalties

Inspect the transformed logits independently from token selection.

LogitProcessingExample.cs
using UAIX.LmRuntime.Sampling;

public static class LogitProcessingExample
{
    /// <summary>
    /// Applies caller-selected bias and history penalties to one logit vector.
    /// </summary>
    /// <param name="logits">The unmodified vocabulary logits.</param>
    /// <returns>A processed copy suitable for token selection.</returns>
    public static float[] Process(ReadOnlySpan<float> logits)
    {
        if (logits.Length < 2)
        {
            throw new ArgumentException(
                "At least two logits are required for this example.",
                nameof(logits));
        }

        var options = new SamplingOptions
        {
            Temperature = 1.0f,
            RepetitionPenalty = 1.15f,
            FrequencyPenalty = 0.2f,
            PresencePenalty = 0.1f,
            LogitBias = new Dictionary<int, float>
            {
                [0] = -2.0f,
                [1] = 1.5f
            },
            Seed = 7
        };

        options.Validate(vocabularySize: logits.Length);

        var state = new SamplingState(options);
        state.RecordToken(0);
        state.RecordToken(0);

        return LogitProcessor.Process(logits, options, state);
    }
}

Match a stop sequence across byte chunks

Detect a stop string even when its UTF-8 bytes span multiple generated tokens.

StopSequenceExample.cs
using System.Text;
using UAIX.LmRuntime.Sampling;

var matcher = new StopSequenceMatcher(
    stopSequences: ["</answer>"],
    includeMatchedBytes: false);

StopSequenceMatchResult first = matcher.Append(
    Encoding.UTF8.GetBytes("Result</ans"));

StopSequenceMatchResult second = matcher.Append(
    Encoding.UTF8.GetBytes("wer>ignored"));

byte[] remaining = matcher.Complete();

Console.Write(Encoding.UTF8.GetString(first.VisibleBytes));
Console.Write(Encoding.UTF8.GetString(second.VisibleBytes));
Console.Write(Encoding.UTF8.GetString(remaining));

Generated API reference

Expand a type to review its documented public fields, properties, constructors, methods, parameter descriptions, and return descriptions.

GenerationFinishReasonUAIX.LmRuntime.Sampling 7 members

Identifies the first decisive condition that ended a generation.

Field None

Generation remains active.

Field StopToken

An exact configured token identifier ended generation.

Field StopText

A configured decoded UTF-8 stop sequence ended generation.

Field TokenLimit

The configured maximum generated-token count was reached.

Field ContextLimit

The configured prompt-plus-generation context bound was reached.

Field Cancelled

Cancellation was observed before publishing another token.

Field ExecutionError

An execution error ended generation.

GenerationUsageUAIX.LmRuntime.Sampling 3 members

Records tokenizer-ID-based usage accounting.

Property PromptTokens

Gets the exact number of prompt token identifiers consumed.

Property CompletionTokens

Gets the exact number of generated token identifiers accepted by the controller.

Property TotalTokens

Gets the checked sum of prompt and completion token counts.

GenerationStepResultUAIX.LmRuntime.Sampling 3 members

Represents the observable result of attempting to publish one generated token.

Property TokenAccepted

Gets whether the token identifier was retained in generated-token output.

Property VisibleBytes

Gets bytes newly safe to publish after stop-prefix matching.

Property FinishReason

Gets the stable finish reason after this step.

GenerationControllerUAIX.LmRuntime.Sampling 9 members

Enforces stop, limit, cancellation, usage, and output-publication boundaries for one generation.

Method GenerationController(int,int,UAIX.LmRuntime.Sampling.SamplingOptions)

Initializes a controller from validated vocabulary and prompt-token bounds.

vocabularySize
The positive tokenizer vocabulary size used to validate token identifiers and size bounded result buffers.
promptTokenCount
The prompt token count used to bound this operation; it must be nonnegative and within the supported range.
options
The optional SamplingOptions controlling GenerationController; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.
Property State

Gets the session-local sampling state used by the same generation.

Property OutputTokenIds

Gets generated token identifiers retained under stop-token emission policy.

Property FinishReason

Gets the first decisive finish reason, or while active.

Property Usage

Gets exact tokenizer-ID usage without deriving counts from text or bytes.

Method AcceptToken(int,System.ReadOnlySpan<byte>,System.Threading.CancellationToken)

Attempts to accept and publish one generated token at a bounded cancellation point.

tokenId
The token identifier to process; it must fall within the validated vocabulary and operation-specific range.
decodedBytes
The decoded bytes sequence used by this operation; its required length, ordering, and element bounds are validated before access.
cancellationToken
Cancellation observed before the token is recorded or published.

Returns: The token/output/finish transition produced by this step.

Method ObserveCancellation(System.Threading.CancellationToken)

Observes cancellation between decode steps without publishing another token.

cancellationToken
The caller-provided token used to cancel the operation before additional work or results are published.

Returns: The GenerationFinishReason result produced by GenerationController.ObserveCancellation for this contract: Observes cancellation between decode steps without publishing another token. It is published only after all documented validation and ownership transitions succeed.

Method Fail(System.Exception)

Records an execution failure without exposing exception details through the stable finish reason.

exception
The non-null execution failure observed by the caller.

Returns: The GenerationFinishReason result produced by GenerationController.Fail for this contract: Records an execution failure without exposing exception details through the stable finish reason. It is published only after all documented validation and ownership transitions succeed.

Method CompleteVisibleBytes

Completes an otherwise active stream and flushes any bytes retained as a possible stop prefix.

Returns: Remaining visible bytes; the finish reason remains GenerationFinishReason.None.

GreedySamplerUAIX.LmRuntime.Sampling 1 member

Provides deterministic greedy token selection with explicit non-finite input policy.

Method Select(System.ReadOnlySpan<float>)

Selects the highest logit index with deterministic lower-index tie-breaking.

logits
The non-empty source logits. NaN is rejected; infinities compare normally.

Returns: The int value computed by GreedySampler.Select for this contract: Selects the highest logit index with deterministic lower-index tie-breaking. Range, finite-value, and overflow checks are completed before the value is returned.

LogitProcessorUAIX.LmRuntime.Sampling 4 members

Applies validated, deterministic token-history and bias policies to logits.

Method Process(System.ReadOnlySpan<float>,UAIX.LmRuntime.Sampling.SamplingOptions,UAIX.LmRuntime.Sampling.SamplingState)

Produces a processed copy of the source logits without exposing partially mutated caller data on validation failure.

logits
The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
options
The optional SamplingOptions controlling Process; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.
state
The validated state value consumed by the operation; mutations, when applicable, are limited to the explicitly documented state owner.

Returns: A newly allocated float[] containing the ordered result of LogitProcessor.Process: Produces a processed copy of the source logits without exposing partially mutated caller data on validation failure. The caller owns the returned array and later mutation cannot alter the source object.

Method ApplyHistoryPenalties(System.Span<float>,UAIX.LmRuntime.Sampling.SamplingOptions,System.Collections.Generic.IReadOnlyDictionary<int,int>)

Applies sign-aware repetition, frequency, and presence penalties in one deterministic pass.

logits
The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
options
The optional SamplingOptions controlling ApplyHistoryPenalties; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.
tokenCounts
The ordered token counts collection of type IReadOnlyDictionary<int, int>; LogitProcessor.ApplyHistoryPenalties validates nullability, count, and element constraints before consuming or snapshotting it and does not retain a mutable caller alias.
Method ApplyBias(System.Span<float>,System.Collections.Generic.IReadOnlyDictionary<int,float>)

Adds all validated per-token biases exactly once.

logits
The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
biases
The ordered biases collection of type IReadOnlyDictionary<int, float>; LogitProcessor.ApplyBias validates nullability, count, and element constraints before consuming or snapshotting it and does not retain a mutable caller alias.
Method SuppressEarlyStopTokens(System.Span<float>,UAIX.LmRuntime.Sampling.SamplingOptions,int)

Marks configured stop tokens ineligible before the exact minimum-token boundary.

logits
The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
options
The optional SamplingOptions controlling SuppressEarlyStopTokens; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.
generatedTokenCount
The generated token count used to bound this operation; it must be nonnegative and within the supported range.
LogitScoreUAIX.LmRuntime.Sampling 2 members

Represents a scored token candidate.

Property TokenIndex

Gets the token index.

Property Score

Gets the logit score.

SamplingCandidateUAIX.LmRuntime.Sampling 2 members

Represents one normalized token candidate retained after all filters.

Property TokenId

Gets the token identifier.

Property Probability

Gets the normalized candidate probability.

SamplingDecisionUAIX.LmRuntime.Sampling 3 members

Describes one deterministic or stochastic sampling decision.

Property TokenId

Gets the selected token identifier.

Property IsGreedy

Gets whether the zero-temperature greedy path made the decision.

Property Candidates

Gets the candidate distribution used for selection, ordered by probability and token identifier.

ProbabilitySamplerUAIX.LmRuntime.Sampling 2 members

Builds stable normalized distributions and samples them with session-local deterministic state.

Method Select(System.ReadOnlySpan<float>,UAIX.LmRuntime.Sampling.SamplingOptions,UAIX.LmRuntime.Sampling.SamplingState)

Processes logits and selects one token under the supplied deterministic policy.

logits
The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
options
The optional SamplingOptions controlling Select; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.
state
The validated state value consumed by the operation; mutations, when applicable, are limited to the explicitly documented state owner.

Returns: The SamplingDecision result produced by ProbabilitySampler.Select for this contract: Processes logits and selects one token under the supplied deterministic policy. It is published only after all documented validation and ownership transitions succeed.

Method BuildDistribution(System.ReadOnlySpan<float>,UAIX.LmRuntime.Sampling.SamplingOptions)

Builds a stable, filtered, and normalized probability distribution without consuming random state.

processedLogits
The processed logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
options
The optional SamplingOptions controlling BuildDistribution; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.

Returns: Candidates ordered by descending probability and ascending token identifier for ties.

SamplingOptionsUAIX.LmRuntime.Sampling 17 members

Defines deterministic logit-processing, candidate-selection, and generation-stop policies for one sampling session.

Instances are treated as immutable configuration after a is created. Validation is intentionally performed before a logit buffer is modified so an invalid option cannot leave partially adjusted data.

Property Temperature

Gets the non-negative temperature. Zero selects the deterministic greedy path.

Property TopK

Gets the maximum candidate count. Zero disables top-k filtering.

Property TopP

Gets the normalized nucleus probability threshold in the inclusive range zero through one.

Property MinimumP

Gets the minimum probability relative to the highest candidate probability.

Property RepetitionPenalty

Gets the positive sign-aware repetition penalty.

Property FrequencyPenalty

Gets the amount subtracted for each prior occurrence of a token.

Property PresencePenalty

Gets the amount subtracted once from any token that has previously appeared.

Property LogitBias

Gets finite per-token additive logit biases.

Property StopTokenIds

Gets exact token identifiers that terminate generation after the minimum-token boundary.

Property StopSequences

Gets UTF-8 stop strings matched across decoded token boundaries.

Property IncludeStopToken

Gets whether a matched stop token is retained in emitted token identifiers.

Property IncludeStopSequence

Gets whether matched stop-sequence bytes are included in visible output.

Property MinimumGeneratedTokens

Gets the minimum generated-token count before stop-token or stop-text policies become eligible.

Property MaximumGeneratedTokens

Gets the maximum number of generated tokens. Zero permits no generated tokens.

Property MaximumContextTokens

Gets the maximum prompt-plus-generation token count.

Property Seed

Gets the deterministic per-session pseudo-random generator seed.

Method Validate(int)

Validates every option and token-indexed policy against a vocabulary size without mutating caller data.

vocabularySize
The positive number of logits accepted by the model.
SamplingStateUAIX.LmRuntime.Sampling 6 members

Stores token history and deterministic pseudo-random state for exactly one generation session.

A state instance must not be shared by independent requests. Keeping history and random state together makes session isolation explicit and prevents interleaved requests from consuming one another's random sequence.

Method SamplingState(UAIX.LmRuntime.Sampling.SamplingOptions)

Initializes isolated state from the immutable session options.

options
The options whose seed initializes the session generator.
Property Random

Gets the session-local deterministic random generator.

Property GeneratedTokenCount

Gets the number of generated token identifiers recorded by this session.

Property TokenCounts

Gets the session-owned prior-token counts through a read-only interface.

Method RecordToken(int)

Records one generated token for repetition, frequency, presence, and usage policies.

tokenId
The non-negative tokenizer identifier generated by the model.
Method GetTokenCount(int)

Gets the prior count for one token without adding it to history.

tokenId
The token identifier to process; it must fall within the validated vocabulary and operation-specific range.

Returns: The recorded count, or zero when the token has not appeared.

Xoshiro256StarStarUAIX.LmRuntime.Sampling 3 members

Implements the xoshiro256** generator with SplitMix64 seed expansion and deterministic unsigned arithmetic.

The generator is intentionally session-local and not thread-safe. Its transition follows the published xoshiro256** reference algorithm; seed expansion prevents the prohibited all-zero state for a zero seed.

Method Xoshiro256StarStar(ulong)

Initializes the generator from one deterministic 64-bit seed.

seed
The numeric seed consumed by Xoshiro256StarStar; it must satisfy the member's documented range, geometry, and finite-value requirements.
Method NextUInt64

Returns the next 64-bit output and advances the generator exactly once.

Returns: The ulong value computed by Xoshiro256StarStar.NextUInt64 for this contract: Returns the next 64-bit output and advances the generator exactly once. Range, finite-value, and overflow checks are completed before the value is returned.

Method NextUnitDouble

Returns a uniformly distributed value in the half-open interval [0, 1).

Returns: The double value computed by Xoshiro256StarStar.NextUnitDouble for this contract: Returns a uniformly distributed value in the half-open interval [0, 1). Range, finite-value, and overflow checks are completed before the value is returned.

StopSequenceMatchResultUAIX.LmRuntime.Sampling 3 members

Represents visible bytes released by one bounded stop-sequence matching step.

Property VisibleBytes

Gets newly visible bytes that cannot participate in a future stop match.

Property MatchedStopSequence

Gets the exact stop string that completed during this step, if any.

Property Matched

Gets whether a terminal stop sequence has matched.

StopSequenceMatcherUAIX.LmRuntime.Sampling 5 members

Matches UTF-8 stop sequences across arbitrary decoded-byte boundaries while retaining only a bounded possible prefix.

Method StopSequenceMatcher(System.Collections.Generic.IEnumerable<string>,bool)

Initializes a matcher from non-empty stop strings.

stopSequences
The exact Unicode strings encoded as UTF-8 for byte matching.
includeMatchedBytes
Whether a terminal match is included in visible output.
Property MaximumRetainedBytes

Gets the maximum retained prefix bytes, bounded by the longest configured stop sequence.

Property RetainedByteCount

Gets the current possible stop-prefix byte count.

Method Append(System.ReadOnlySpan<byte>)

Appends one decoded byte chunk and releases bytes that can no longer participate in a stop match.

bytes
The next decoded UTF-8 bytes; chunks may split a code point or stop sequence.

Returns: The newly visible bytes and optional terminal match.

Method Complete

Completes matching and releases any retained non-matching prefix bytes.

Returns: A newly allocated byte[] containing the ordered result of StopSequenceMatcher.Complete: Completes matching and releases any retained non-matching prefix bytes. The caller owns the returned array and later mutation cannot alter the source object.

TopKSelectorUAIX.LmRuntime.Sampling 1 member

Provides partial top-k selection for logit arrays.

Method SelectTopK(System.ReadOnlySpan<float>,int)

Selects the highest scoring token candidates without sorting the full input.

logits
The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
k
The numeric k consumed by SelectTopK; it must satisfy the member's documented range, geometry, and finite-value requirements.

Returns: The selected candidates in descending score order with deterministic index tie-breaking.

Frequently asked questions

When should I use GreedySampler?

Use it for deterministic argmax generation, parity tests, and local facade paths that intentionally do not expose probabilistic settings.

Where should SamplingState live?

Create one state per generation sequence and retain it across steps. Starting a new state resets random progression and token-frequency history.

Does a fixed seed guarantee identical text everywhere?

No. It controls the package random generator. Identical output still depends on identical logits, processing settings, token history, tokenizer, floating-point behavior, and model execution.

Why match stop sequences as bytes?

Generated tokens may split one Unicode scalar or stop string across token boundaries. Byte-level retention avoids decoding incomplete fragments prematurely.