UAIX.LmRuntime / Package guide

UAIX.LmRuntime.Models.Llama

LLaMA configuration, tensor binding, mapped weights, reference sessions, KV cache, generation, and persistence.

Required For LLaMA graph/session internals

UAIX.LmRuntime.Models.Llama

LLaMA-family configuration, tensor binding, mapped weight sources, reference forward execution, sessions, KV cache, generation, persistence, and parity evidence.

Open on NuGet Package family

Overview

LLaMA-family graph configuration and reference forward-pass primitives for pure C# local LLM runtime inference.

Who should use it Runtime developers and advanced applications that need direct control of the LLaMA-family model graph and deterministic reference sessions.

Execution status Managed LLaMA-family configuration, tensor binding, reference forward execution, sessions, KV cache, generation, and persistence are represented in the supplied source.

Install

.NET CLI

dotnet add package UAIX.LmRuntime.Models.Llama

Project file

<PackageReference Include="UAIX.LmRuntime.Models.Llama" />

Version policy: The documentation deliberately omits UAIX.LmRuntime package version numbers. Resolve and pin versions through your normal dependency-management and lock-file process.

Direct package dependencies

UAIX.LmRuntime.Abstractions Guide NuGet ↗

UAIX.LmRuntime.Gguf Guide NuGet ↗

UAIX.LmRuntime.Kernels.Cpu Guide NuGet ↗

UAIX.LmRuntime.Sampling Guide NuGet ↗

UAIX.LmRuntime.Tensors Guide NuGet ↗

UAIX.LmRuntime.Tokenization Guide NuGet ↗

Package role and boundaries

Required For LLaMA graph/session internals

You need to derive LLaMA configuration from GGUF metadata and validate architecture invariants.
You need required-tensor registries, binding manifests, mapped/array weight sources, reference forward operations, or storage parity.
You need direct mapped/reference sessions, deterministic greedy generation, KV-cache state, or session artifacts.

Boundary

Claiming every LLaMA-named or derivative architecture is supported without model-specific validation.
Using a lower-level session when the LocalEndpoint facade already provides the bounded application behavior you need.

Architecture validation

LlamaModelConfig derives graph dimensions from GGUF metadata and validates head counts, dimensions, context, vocabulary, RoPE, and normalization requirements before execution.

Binding is evidence

Required tensor roles, storage kinds, ownership, diagnostics, and manifests make missing, duplicate, incompatible, or unexpectedly materialized weights observable.

Session state is explicit

Reference sessions own position, logits, and KV-cache state. Callers choose reset behavior and can capture, serialize, fingerprint, restore, or discard state under bounded policies.

Key types

These are the main public entry points. The generated reference below includes the documented public package surface.

LlamaModelConfig LlamaMappedModelLoader LlamaMappedModel LlamaMappedReferenceSession LlamaReferenceSession LlamaTensorBinder TensorBindingManifest ReferenceKvCache ReferenceKvCacheSerializer LlamaSessionArtifactSerializer LlamaStorageParityRunner RealModelSmokeRunner

Coding examples

Examples use the documented public package surface. Paths, identities, runtime identifiers, device evidence, and application policy remain host inputs.

Validate LLaMA configuration from GGUF

Separate container parsing from architecture-specific configuration checks.

LlamaConfigurationExample.cs

using UAIX.LmRuntime.Gguf;
using UAIX.LmRuntime.Models.Llama;

GgufModel gguf = GgufReader.Read(
    "models/model.gguf",
    new GgufParseOptions());

LlamaModelConfig configuration =
    LlamaModelConfig.FromGguf(gguf);

configuration.Validate();

Console.WriteLine($"Model: {configuration.ModelName}");
Console.WriteLine($"Layers: {configuration.BlockCount}");
Console.WriteLine($"Embedding: {configuration.EmbeddingLength}");
Console.WriteLine($"Heads: {configuration.AttentionHeadCount}");
Console.WriteLine($"KV heads: {configuration.AttentionKeyValueHeadCount}");
Console.WriteLine($"Context: {configuration.ContextLength}");

Load a mapped model and decode one greedy token

Use direct mapped execution for diagnostics, model validation, and deterministic one-token evidence.

MappedOneTokenExample.cs

using UAIX.LmRuntime.Models.Llama;

var loader = new LlamaMappedModelLoader();

using LlamaMappedModel model = loader.Load(
    "models/model.gguf",
    new LlamaMappedModelLoadOptions
    {
        RuntimeMode = LlamaRuntimeMode.DeterministicParity,
        ComputeModelSha256 = true
    });

using LlamaMappedReferenceSession session =
    model.CreateReferenceSession();

LlamaMappedGreedyTokenResult result =
    session.DecodeOneGreedy(
        "Hello",
        new LlamaOneTokenOptions
        {
            ResetSession = true,
            ParseSpecialTokens = false,
            AddSpecialTokens = true,
            EmitTokenizerTrace = false
        });

Console.WriteLine($"{result.TokenId}: {result.TokenText}");
Console.WriteLine($"Selected logit: {result.SelectedLogit}");
Console.WriteLine($"Position: {result.Position}");

Generate several greedy tokens with caller-owned buffers

Bound output allocation and observe each committed token.

MappedGenerationExample.cs

using UAIX.LmRuntime.Models.Llama;
using UAIX.LmRuntime.Tokenization;

public static class MappedGenerationExample
{
    /// <summary>
    /// Generates greedy tokens into caller-owned buffers and observes each committed selection.
    /// </summary>
    /// <param name="model">The loaded mapped model that defines vocabulary capacity.</param>
    /// <param name="session">The isolated mapped reference session.</param>
    /// <param name="prompt">The prompt to tokenize and prefill.</param>
    /// <param name="maximumTokens">The maximum number of output tokens.</param>
    /// <param name="cancellationToken">A token observed between committed model steps.</param>
    /// <returns>The bounded greedy-generation result.</returns>
    public static LlamaGreedyGenerationResult Generate(
        LlamaMappedModel model,
        LlamaMappedReferenceSession session,
        string prompt,
        int maximumTokens,
        CancellationToken cancellationToken)
    {
        ArgumentNullException.ThrowIfNull(model);
        ArgumentNullException.ThrowIfNull(session);
        ArgumentException.ThrowIfNullOrWhiteSpace(prompt);
        ArgumentOutOfRangeException.ThrowIfNegativeOrZero(maximumTokens);

        int[] generatedTokenIds = new int[maximumTokens];
        float[] finalLogits = new float[model.Configuration.VocabularySize];

        return session.GenerateGreedy(
            prompt,
            generatedTokenIds,
            finalLogits,
            new LlamaGreedyGenerationOptions
            {
                MaximumTokens = maximumTokens,
                ResetSession = true,
                EndOfSequenceTokenId = null,
                StopTokenIds = Array.Empty<int>()
            },
            new TokenizationOptions
            {
                AddSpecialTokens = true,
                ParseSpecialTokens = false
            },
            token => Console.WriteLine(
                $"{token.Sequence}: {token.TokenId} ({token.SelectedLogit})"),
            cancellationToken);
    }
}

Use the deterministic in-memory fixture

Exercise reference execution without depending on an external model artifact.

ReferenceFixtureExample.cs

using UAIX.LmRuntime.Models.Llama;

LlamaReferenceFixture fixture =
    LlamaReferenceFixtureFactory.CreateDeterministic();

LlamaReferenceSession session = fixture.CreateSession();

LlamaGreedyTokenResult result = session.DecodeOneGreedy(
    fixture.PromptTokenIds,
    resetSession: true);

Console.WriteLine($"Token: {result.TokenId}");
Console.WriteLine($"Position: {result.Position}");

Save and restore session state

Bind persisted state to model, configuration, tokenizer, and cache-layout fingerprints, and enforce a maximum artifact size.

SessionPersistenceExample.cs

using UAIX.LmRuntime.Models.Llama;

public static class SessionPersistenceExample
{
    /// <summary>
    /// Saves a mapped reference session and immediately reloads the authenticated artifact.
    /// </summary>
    /// <param name="model">The mapped model that supplies model identity evidence.</param>
    /// <param name="session">The session whose deterministic state will be persisted.</param>
    /// <param name="statePath">The destination path for the session artifact.</param>
    /// <param name="configurationFingerprint">The host-computed configuration fingerprint.</param>
    /// <param name="tokenizerFingerprint">The host-computed tokenizer fingerprint.</param>
    /// <param name="cacheLayoutFingerprint">The host-computed cache-layout fingerprint.</param>
    /// <returns>The authenticated artifact loaded from disk.</returns>
    public static LlamaSessionArtifact SaveAndReload(
        LlamaMappedModel model,
        LlamaMappedReferenceSession session,
        string statePath,
        string configurationFingerprint,
        string tokenizerFingerprint,
        string cacheLayoutFingerprint)
    {
        ArgumentNullException.ThrowIfNull(model);
        ArgumentNullException.ThrowIfNull(session);
        ArgumentException.ThrowIfNullOrWhiteSpace(statePath);

        string? directory = Path.GetDirectoryName(
            Path.GetFullPath(statePath));

        if (!string.IsNullOrEmpty(directory))
        {
            Directory.CreateDirectory(directory);
        }

        var persistence = new LlamaSessionPersistenceOptions
        {
            ModelSha256 = model.Manifest.ModelSha256,
            ConfigurationFingerprint = configurationFingerprint,
            TokenizerFingerprint = tokenizerFingerprint,
            CacheLayoutFingerprint = cacheLayoutFingerprint,
            SamplerMode = "greedy",
            GeneratedUtc = DateTimeOffset.UtcNow,
            ClaimStatus = "local-evidence",
            MaximumByteCount = 64 * 1024 * 1024
        };

        session.SaveState(statePath, persistence);

        return session.LoadState(
            statePath,
            maximumByteCount: persistence.MaximumByteCount);
    }
}

Boundary: The caller supplies and validates compatibility fingerprints; persisted state should be treated as model-bound untrusted input.

Generated API reference

Expand a type to review its documented public fields, properties, constructors, methods, parameter descriptions, and return descriptions.

LlamaReferenceSessionSnapshotUAIX.LmRuntime.Models.Llama 5 members

Captures complete deterministic reference-session state without retaining live model pointers.

Property SchemaVersion

Gets the in-memory snapshot schema version.

Property Position

Gets the next sequence position.

Property TokenHistory

Gets committed input token identifiers in sequence order.

Property LastLogits

Gets the most recently computed logits.

Property KeyValueCache

Gets complete capacity-shaped key/value state.

LlamaSessionPersistenceOptionsUAIX.LmRuntime.Models.Llama 13 members

Configures digest-bound complete session serialization.

Property PackageVersion

Gets the package version that emitted the artifact.

Property MinimumCompatiblePackageVersion

Gets the oldest supported package version.

Property MaximumCompatiblePackageVersion

Gets the newest supported package version.

Property ModelSha256

Gets the complete model artifact SHA-256.

Property ConfigurationFingerprint

Gets the LLaMA configuration fingerprint.

Property TokenizerFingerprint

Gets the GGUF tokenizer fingerprint.

Property CacheLayoutFingerprint

Gets the persistent cache-layout identity.

Property SamplerMode

Gets the deterministic sampler mode.

Property EndOfSequenceTokenId

Gets the optional end-of-sequence token identifier.

Property StopTokenIds

Gets configured stop-token identifiers.

Property GeneratedUtc

Gets the UTC generation time.

Property ClaimStatus

Gets the evidence claim status.

Property MaximumByteCount

Gets the maximum accepted artifact byte count.

LlamaSessionArtifactUAIX.LmRuntime.Models.Llama 15 members

Carries verified complete deterministic session state and compatibility identities.

Property SchemaVersion

Gets the portable schema version.

Property PackageVersion

Gets the package version that emitted the artifact.

Property MinimumCompatiblePackageVersion

Gets the oldest supported package version.

Property MaximumCompatiblePackageVersion

Gets the newest supported package version.

Property ModelSha256

Gets the complete model artifact SHA-256.

Property ConfigurationFingerprint

Gets the model configuration fingerprint.

Property TokenizerFingerprint

Gets the tokenizer fingerprint.

Property CacheLayoutFingerprint

Gets the cache-layout fingerprint.

Property SamplerMode

Gets the sampler mode.

Property EndOfSequenceTokenId

Gets the optional end-of-sequence token identifier.

Property StopTokenIds

Gets configured stop-token identifiers.

Property GeneratedUtc

Gets the artifact generation time in UTC.

Property ClaimStatus

Gets the evidence claim status.

Property ContentSha256

Gets the SHA-256 of every serialized byte preceding the digest.

Property Snapshot

Gets the complete session snapshot.

LlamaSessionArtifactSerializerUAIX.LmRuntime.Models.Llama 5 members

Serializes complete deterministic reference-session state in bounded little-endian form.

Field SchemaVersion

Gets the supported artifact schema version.

Method

Serialize(UAIX.LmRuntime.Models.Llama.LlamaReferenceSessionSnapshot,UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)

Serializes complete session state and appends a SHA-256 digest.

snapshot: The immutable state snapshot being validated, serialized, restored, or analyzed without retaining caller-owned mutable aliases.
options: The optional LlamaSessionPersistenceOptions controlling Serialize; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.

Returns: A newly allocated byte[] containing the ordered result of LlamaSessionArtifactSerializer.Serialize: Serializes complete session state and appends a SHA-256 digest. The caller owns the returned array and later mutation cannot alter the source object.

Method Deserialize(System.ReadOnlySpan<byte>,int)

Deserializes the llama session artifact from the validated persisted representation.

bytes: The bytes sequence used by this operation; its required length, ordering, and element bounds are validated before access.
maximumByteCount: The maximum byte count used to bound this operation; it must be nonnegative and within the supported range.

Returns: The LlamaSessionArtifact result produced by LlamaSessionArtifactSerializer.Deserialize for this contract: Deserializes the llama session artifact from the validated persisted representation. It is published only after all documented validation and ownership transitions succeed.

Method

Save(string,UAIX.LmRuntime.Models.Llama.LlamaReferenceSessionSnapshot,UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)

Writes a complete artifact to a local file.

path: The local file-system path processed by this operation; it must satisfy the containing component's path and scope policy.
snapshot: The immutable state snapshot being validated, serialized, restored, or analyzed without retaining caller-owned mutable aliases.
options: The optional LlamaSessionPersistenceOptions controlling Save; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.

Returns: The LlamaSessionArtifact result produced by LlamaSessionArtifactSerializer.Save for this contract: Writes a complete artifact to a local file. It is published only after all documented validation and ownership transitions succeed.

Method Load(string,int)

Reads and verifies a complete artifact from a local file.

path: The local file-system path processed by this operation; it must satisfy the containing component's path and scope policy.
maximumByteCount: The maximum byte count used to bound this operation; it must be nonnegative and within the supported range.

Returns: The verified artifact, with ownership and disposal obligations defined by the returned type and the Load contract.

FixtureVerificationDiagnosticUAIX.LmRuntime.Models.Llama 2 members

Represents one diagnostic emitted while verifying a checked-in GGUF fixture directory.

Property Code

Gets the stable diagnostic code.

Property Message

Gets the diagnostic message.

FixtureVerificationResultUAIX.LmRuntime.Models.Llama 5 members

Represents the result of bounded, offline fixture directory verification.

Property FixtureDirectory

Gets the normalized fixture directory.

Property ArtifactPath

Gets the normalized GGUF artifact path.

Property ArtifactSha256

Gets the verified SHA-256 digest.

Property Diagnostics

Gets verification diagnostics.

Property IsValid

Gets whether no verification diagnostics were emitted.

FixtureDirectoryVerifierUAIX.LmRuntime.Models.Llama 1 member

Verifies fixture manifests, artifact paths, digests, and basic loadability without network access.

Method Verify(string)

Verifies the supplied fixture directory and returns bounded evidence only after every required check succeeds.

fixtureDirectory: The directory containing a fixture manifest and GGUF artifact.

Returns: The FixtureVerificationResult result produced by FixtureDirectoryVerifier.Verify for this contract: Verifies the supplied fixture directory and returns bounded evidence only after every required check succeeds. It is published only after all documented validation and ownership transitions succeed.

LlamaWeightStorageModeUAIX.LmRuntime.Models.Llama 3 members

Identifies how a bound tensor participates in reference execution.

Field Mapped

The tensor remains a borrowed view over the mapped GGUF file.

Field Alias

The tensor aliases another mapped tensor.

Field CopiedForReference

The tensor was explicitly copied into a bounded float32 reference buffer.

LlamaBoundTensorUAIX.LmRuntime.Models.Llama 5 members

Represents one semantic LLaMA weight bound to mapped model storage.

Property Role

Gets the semantic tensor role.

Property BlockIndex

Gets the optional transformer block index.

Property Binding

Gets the validated binding manifest entry.

Property View

Gets the borrowed mapped tensor view.

Property StorageMode

Gets the storage mode represented by this binding.

LlamaBoundLayerWeightSetUAIX.LmRuntime.Models.Llama 10 members

Represents the mapped tensors required by one LLaMA transformer block.

Property BlockIndex

Gets the zero-based transformer block index.

Property AttentionNorm

Gets the attention normalization tensor.

Property AttentionQuery

Gets the query projection tensor.

Property AttentionKey

Gets the key projection tensor.

Property AttentionValue

Gets the value projection tensor.

Property AttentionOutput

Gets the attention output projection tensor.

Property FeedForwardNorm

Gets the feed-forward normalization tensor.

Property FeedForwardGate

Gets the feed-forward gate projection tensor.

Property FeedForwardUp

Gets the feed-forward up projection tensor.

Property FeedForwardDown

Gets the feed-forward down projection tensor.

LlamaReferenceMaterializationRecordUAIX.LmRuntime.Models.Llama 5 members

Records one explicit managed copy made for the bounded scalar reference runtime.

Property TensorName

Gets the source tensor name.

Property Role

Gets the semantic tensor role.

Property BlockIndex

Gets the optional transformer block index.

Property CopiedByteCount

Gets the copied byte count.

Property StorageMode

Gets the resulting storage mode.

LlamaReferenceWeightMaterializationUAIX.LmRuntime.Models.Llama 3 members

Contains immutable float32 weights and copy evidence for the scalar reference runtime.

Property Weights

Gets the immutable reference weights.

Property Records

Gets every bounded copy made while materializing the fixture.

Property TotalCopiedByteCount

Gets the total number of copied bytes.

LlamaBoundWeightSetUAIX.LmRuntime.Models.Llama 11 members

Resolves a complete LLaMA binding manifest into stable mapped tensor views.

This object does not own the operating-system mapping. Every view borrows storage from the supplied and becomes invalid when that mapping is disposed.

Method

LlamaBoundWeightSet(UAIX.LmRuntime.Gguf.MappedGgufFile,UAIX.LmRuntime.Models.Llama.TensorBindingManifest,UAIX.LmRuntime.Models.Llama.LlamaModelConfig)

Initializes a mapped LLaMA weight set from a complete binding manifest.

mapping: The mapped GGUF file that owns tensor storage.
manifest: The validated manifest that binds tensor requirements, model identity, and storage diagnostics used by the operation.
config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.

Property Mapping

Gets the mapping that owns all borrowed tensor bytes.

Property Configuration

Gets the validated model configuration.

Property Manifest

Gets the complete tensor binding manifest.

Property Bindings

Gets all semantic mapped tensor bindings.

Property TokenEmbeddings

Gets the token embedding tensor.

Property OutputNorm

Gets the final output normalization tensor.

Property Output

Gets the output projection tensor or tied embedding alias.

Property Layers

Gets the block-local mapped weight sets.

Method Get(UAIX.LmRuntime.Models.Llama.LlamaTensorRole,System.Nullable<int>)

Retrieves the llama bound tensor from the current LlamaBoundWeightSet state after validating the requested access.

role: The semantic LLaMA tensor role used to select the required bound tensor from the validated manifest.
blockIndex: The zero-based block index; it must identify an existing position within the relevant validated range.

Returns: The LlamaBoundTensor result produced by LlamaBoundWeightSet.Get for this contract: Retrieves the llama bound tensor from the current LlamaBoundWeightSet state after validating the requested access. It is published only after all documented validation and ownership transitions succeed.

Method MaterializeFloat32ReferenceWeights(int)

Materializes bounded float32 arrays for the scalar correctness runtime.

maximumCopiedBytes: The maximum total bytes that may be copied from mapped storage.

Returns: The immutable reference weights and explicit copy ledger.

LlamaRuntimeModeUAIX.LmRuntime.Models.Llama 1 member

Identifies the deterministic execution contract used by a mapped model session.

Field DeterministicParity

Runs only deterministic parity behavior without adaptive governance.

LlamaOneTokenFinishReasonUAIX.LmRuntime.Models.Llama 1 member

Identifies why a bounded one-token generation operation ended.

Field OneTokenCompleted

Exactly one greedy token was selected as requested.

LlamaMappedModelLoadOptionsUAIX.LmRuntime.Models.Llama 5 members

Configures loading of a mapped LLaMA GGUF artifact.

Property ParseOptions

Gets GGUF parser safety limits.

Property BindingOptions

Gets semantic tensor binding validation options.

Property RuntimeMode

Gets the runtime mode.

Property MaximumReferenceMaterializationBytes

Gets the maximum bytes that scalar reference sessions may copy from mapped F32 weights.

Property ComputeModelSha256

Gets whether a SHA-256 digest of the complete artifact should be computed during load.

LlamaMappedModelLoadTimingsUAIX.LmRuntime.Models.Llama 5 members

Records measured stages of mapped model loading.

Property ParseDuration

Gets metadata and tensor catalog parse duration.

Property MapDuration

Gets operating-system memory-map creation duration.

Property CompositionDuration

Gets architecture, tokenizer, and binding composition duration.

Property HashDuration

Gets optional complete-file digest duration.

Property TotalDuration

Gets total load duration.

LlamaMappedModelManifestUAIX.LmRuntime.Models.Llama 13 members

Describes the immutable evidence produced while loading a mapped LLaMA model.

Property ModelPath

Gets the normalized model path.

Property ModelByteCount

Gets the exact mapped GGUF file length observed during parsing.

Property ModelSha256

Gets the optional complete-file SHA-256 digest.

Property GgufVersion

Gets the GGUF version.

Property Architecture

Gets the architecture identifier.

Property ModelName

Gets the model display name.

Property Tokenizer

Gets the tokenizer implementation name.

Property BoundTensorCount

Gets the bound tensor count.

Property StorageSummary

Gets the physical tensor storage summary used by direct mapped execution.

Property ManagedModelWeightCopiedByteCount

Gets the managed model-weight byte count copied by the default execution path.

Property RuntimeMode

Gets the selected execution mode.

Property Timings

Gets load-stage timings.

Property Evidence

Gets load evidence messages.

LlamaOneTokenOptionsUAIX.LmRuntime.Models.Llama 4 members

Configures one deterministic mapped-model greedy-token operation.

Property ResetSession

Gets whether the session should reset before prompt evaluation.

Property ParseSpecialTokens

Gets whether raw special-token text should be recognized.

Property AddSpecialTokens

Gets whether model-defined BOS/EOS behavior should be applied.

Property EmitTokenizerTrace

Gets whether tokenizer trace events should be captured.

LlamaOneTokenTimingsUAIX.LmRuntime.Models.Llama 4 members

Records measured stages of exactly one mapped-model greedy decode operation.

Property TokenizationDuration

Gets prompt tokenization duration.

Property PrefillDuration

Gets prompt prefill duration.

Property SelectionDuration

Gets greedy selection and token decode duration.

Property TotalDuration

Gets total operation duration.

LlamaMappedGreedyTokenResultUAIX.LmRuntime.Models.Llama 20 members

Represents an end-to-end prompt-to-one-token result from a mapped GGUF model.

Property ModelPath

Gets the normalized GGUF model path used for the operation.

Property ModelSha256

Gets the optional complete-file model digest computed during load.

Property ModelName

Gets the model display name declared by GGUF metadata.

Property Architecture

Gets the model architecture identifier.

Property Prompt

Gets the input prompt.

Property PromptTokenIds

Gets the exact prompt token identifiers.

Property TokenizerTrace

Gets tokenizer trace events when requested.

Property TokenId

Gets the selected token identifier.

Property TokenText

Gets the selected token text.

Property SelectedLogit

Gets the selected token logit.

Property Logits

Gets the complete next-token logits for parity diagnostics.

Property StorageSummary

Gets the mapped storage-type summary.

Property ManagedModelWeightCopiedByteCount

Gets the managed model-weight bytes copied by the session path.

Property ManagedAllocatedByteCount

Gets managed bytes allocated on the current thread during the measured operation.

Property Position

Gets the sequence position that produced the logits.

Property KeyValueCacheTokenCount

Gets the resulting key/value cache token count.

Property FinishReason

Gets the deterministic finish reason.

Property RuntimeMode

Gets the runtime mode.

Property Timings

Gets measured operation timings.

Property Evidence

Gets evidence statements for the deterministic one-token operation.

LlamaMappedModelLoaderUAIX.LmRuntime.Models.Llama 1 member

Loads a local GGUF artifact into a mapped, tokenizer-aware LLaMA model composition.

Method Load(string,UAIX.LmRuntime.Models.Llama.LlamaMappedModelLoadOptions)

Loads and validates one mapped local model.

path: The local file-system path processed by this operation; it must satisfy the containing component's path and scope policy.
options: The optional LlamaMappedModelLoadOptions controlling Load; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.

Returns: The owned mapped model, with ownership and disposal obligations defined by the returned type and the Load contract.

LlamaMappedModelUAIX.LmRuntime.Models.Llama 14 members

Owns a mapped GGUF artifact and immutable LLaMA runtime composition.

Property Mapping

Gets the mapped model storage owner.

Property Configuration

Gets the validated LLaMA configuration.

Property TokenizerMetadata

Gets validated GGUF tokenizer metadata.

Property Tokenizer

Gets the exact metadata-driven tokenizer.

Property BindingManifest

Gets the tensor binding manifest.

Property Weights

Gets the mapped semantic weight set.

Property WeightSource

Gets the direct mapped execution weight source.

Property Options

Gets the load options retained for deterministic session creation.

Property Manifest

Gets the immutable load evidence manifest.

Property IsDisposed

Gets whether the model has been disposed.

Method CreateReferenceSession

Creates an independent scalar reference session with its own key/value state.

Returns: The new mapped reference session, with ownership and disposal obligations defined by the returned type and the CreateReferenceSession contract.

Method CreateMaterializedReferenceSession

Creates an independent compatibility session over explicitly materialized float32 arrays.

Returns: The materialized compatibility session, with ownership and disposal obligations defined by the returned type and the CreateMaterializedReferenceSession contract.

Method GetReferenceMaterialization

Gets the bounded reference materialization evidence, creating it on first use.

Returns: The LlamaReferenceWeightMaterialization result produced by LlamaMappedModel.GetReferenceMaterialization for this contract: Gets the bounded reference materialization evidence, creating it on first use. It is published only after all documented validation and ownership transitions succeed.

Method Dispose

Releases resources owned by LlamaMappedModel and transitions it to the disposed state.

LlamaMappedReferenceSessionUAIX.LmRuntime.Models.Llama 12 members

Combines exact GGUF tokenization with an independent scalar reference session.

Property Position

Gets the current next-token sequence position.

Property KvCache

Gets the typed session-local key/value cache.

Property IsDisposed

Gets whether this session has released its state.

Method Reset

Resets this session's sequence and key/value state.

Method DecodeOneGreedy(string,UAIX.LmRuntime.Models.Llama.LlamaOneTokenOptions)

Tokenizes a prompt, executes prefill, selects argmax, and decodes exactly one token.

prompt: The prompt processed by the configured encoding or normalization rules; it must satisfy the declared nullability contract.
options: The optional LlamaOneTokenOptions controlling DecodeOneGreedy; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.

Returns: The LlamaMappedGreedyTokenResult result produced by LlamaMappedReferenceSession.DecodeOneGreedy for this contract: Tokenizes a prompt, executes prefill, selects argmax, and decodes exactly one token. It is published only after all documented validation and ownership transitions succeed.

Method

GenerateGreedy(string,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,UAIX.LmRuntime.Tokenization.TokenizationOptions,System.Threading.CancellationToken)

Tokenizes a prompt and generates greedy token identifiers into caller-owned buffers.

prompt: The prompt processed by the configured encoding or normalization rules; it must satisfy the declared nullability contract.
generatedTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
finalLogits: The final logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
generationOptions: The generation options that define validation limits and execution behavior; required values are checked before use.
tokenizationOptions: The tokenization options that define validation limits and execution behavior; required values are checked before use.
cancellationToken: The caller-provided token used to cancel the operation before additional work or results are published.

Returns: The LlamaGreedyGenerationResult result produced by LlamaMappedReferenceSession.GenerateGreedy for this contract: Tokenizes a prompt and generates greedy token identifiers into caller-owned buffers. It is published only after all documented validation and ownership transitions succeed.

Method

GenerateGreedy(string,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,UAIX.LmRuntime.Tokenization.TokenizationOptions,System.Action<UAIX.LmRuntime.Models.Llama.LlamaGeneratedToken>,System.Threading.CancellationToken)

Tokenizes a prompt, generates greedy token identifiers, and reports each selected token synchronously.

prompt: The prompt processed by the configured encoding or normalization rules; it must satisfy the declared nullability contract.
generatedTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
finalLogits: The final logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.
generationOptions: The generation options that define validation limits and execution behavior; required values are checked before use.
tokenizationOptions: The tokenization options that define validation limits and execution behavior; required values are checked before use.
tokenObserver: The optional observer invoked once for each selected token.
cancellationToken: A token observed before selection and before the next committed model step.

Returns: The LlamaGreedyGenerationResult result produced by LlamaMappedReferenceSession.GenerateGreedy for this contract: Tokenizes a prompt, generates greedy token identifiers, and reports each selected token synchronously. It is published only after all documented validation and ownership transitions succeed.

Method ExportState(UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)

Exports complete deterministic state with model, configuration, tokenizer, and cache-layout identities.

options: Optional persistence metadata. Empty identity fields are resolved from the mapped model.

Returns: A newly allocated byte[] containing the ordered result of LlamaMappedReferenceSession.ExportState: Exports complete deterministic state with model, configuration, tokenizer, and cache-layout identities. The caller owns the returned array and later mutation cannot alter the source object.

Method SaveState(string,UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)

Saves complete deterministic state to a local artifact.

path: The local file-system path processed by this operation; it must satisfy the containing component's path and scope policy.
options: The optional LlamaSessionPersistenceOptions controlling SaveState; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.

Returns: The LlamaSessionArtifact result produced by LlamaMappedReferenceSession.SaveState for this contract: Saves complete deterministic state to a local artifact. It is published only after all documented validation and ownership transitions succeed.

Method RestoreState(System.ReadOnlySpan<byte>,int)

Restores verified complete state after enforcing mapped model and tokenizer identities.

bytes: The bytes sequence used by this operation; its required length, ordering, and element bounds are validated before access.
maximumByteCount: The maximum byte count used to bound this operation; it must be nonnegative and within the supported range.

Returns: The LlamaSessionArtifact result produced by LlamaMappedReferenceSession.RestoreState for this contract: Restores verified complete state after enforcing mapped model and tokenizer identities. It is published only after all documented validation and ownership transitions succeed.

Method LoadState(string,int)

Loads and restores complete deterministic state from a local artifact.

path: The local file-system path processed by this operation; it must satisfy the containing component's path and scope policy.
maximumByteCount: The maximum byte count used to bound this operation; it must be nonnegative and within the supported range.

Returns: The verified artifact, with ownership and disposal obligations defined by the returned type and the LoadState contract.

Method Dispose

Releases resources owned by LlamaMappedReferenceSession and transitions it to the disposed state.

LlamaModelConfigUAIX.LmRuntime.Models.Llama 17 members

Represents LLaMA-family transformer configuration reconstructed from GGUF metadata.

Property Architecture

Gets the architecture name.

Property ModelName

Gets the optional model display name.

Property EmbeddingLength

Gets the embedding length.

Property BlockCount

Gets the transformer block count.

Property FeedForwardLength

Gets the feed-forward hidden length.

Property AttentionHeadCount

Gets the attention head count.

Property AttentionKeyValueHeadCount

Gets the attention key/value head count.

Property ContextLength

Gets the training context length.

Property VocabularySize

Gets the vocabulary size.

Property RopeDimensionCount

Gets the RoPE dimension count per attention head.

Property RopeFrequencyBase

Gets the RoPE frequency base.

Property RmsNormEpsilon

Gets the RMSNorm epsilon.

Property SupportsTiedOutputProjection

Gets whether the loader may use token embeddings as the output projection when output.weight is absent.

Property HeadDimension

Gets the dimension of one query attention head.

Property KeyValueDimension

Gets the flattened key/value projection dimension.

Method FromGguf(UAIX.LmRuntime.Gguf.GgufModel)

Creates a LLaMA-family configuration from GGUF metadata.

model: The parsed GGUF model whose validated metadata and tensor catalog are consumed by this operation.

Returns: The LlamaModelConfig result produced by LlamaModelConfig.FromGguf for this contract: Creates a LLaMA-family configuration from GGUF metadata. It is published only after all documented validation and ownership transitions succeed.

Method Validate

Validates architectural invariants required by the scalar LLaMA runtime.

LlamaReferenceForwardPassUAIX.LmRuntime.Models.Llama 2 members

Provides tiny reference building blocks for LLaMA-family correctness tests.

Method RmsNorm(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>,float)

Applies the LLaMA RMSNorm operation through the CPU reference kernel.

input: The source data consumed by the operation; caller-owned storage is not retained after the method returns.
weight: The weight sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.
epsilon: The positive normalization epsilon added to the mean-square term to avoid division by zero while preserving deterministic numerical behavior.

Method ApplyRope(System.Span<float>,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,int)

Applies LLaMA-style RoPE to a query or key vector in place.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
cos: The cos sequence used by this operation; its required length, ordering, and element bounds are validated before access.
sin: The sin sequence used by this operation; its required length, ordering, and element bounds are validated before access.
ropeDimensions: The even number of leading head dimensions transformed by rotary positional encoding.

LlamaReferenceLayerWeightsUAIX.LmRuntime.Models.Llama 9 members

Stores immutable float32 weights for one scalar/reference LLaMA transformer block.

Property AttentionNorm

Gets the attention RMSNorm scale.

Property AttentionQuery

Gets the query projection matrix in row-major logical order.

Property AttentionKey

Gets the key projection matrix in row-major logical order.

Property AttentionValue

Gets the value projection matrix in row-major logical order.

Property AttentionOutput

Gets the attention output projection matrix in row-major logical order.

Property FeedForwardNorm

Gets the feed-forward RMSNorm scale.

Property FeedForwardGate

Gets the feed-forward gate projection matrix in row-major logical order.

Property FeedForwardUp

Gets the feed-forward up projection matrix in row-major logical order.

Property FeedForwardDown

Gets the feed-forward down projection matrix in row-major logical order.

LlamaReferenceModelWeightsUAIX.LmRuntime.Models.Llama 5 members

Stores immutable float32 weights for the deterministic LLaMA reference runtime.

Property TokenEmbeddings

Gets the token embedding table in row-major logical order.

Property Layers

Gets transformer block weights in execution order.

Property OutputNorm

Gets the final RMSNorm scale.

Property OutputProjection

Gets the output projection matrix in row-major logical order. An empty value means tied embeddings.

Method Validate(UAIX.LmRuntime.Models.Llama.LlamaModelConfig)

Validates all reference-weight shapes against a LLaMA configuration.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.

LlamaGreedyTokenResultUAIX.LmRuntime.Models.Llama 5 members

Represents exactly one greedily selected token produced by the reference runtime.

Property TokenId

Gets the selected token identifier.

Property TokenText

Gets the selected token text when a tokenizer is attached.

Property PromptTokenCount

Gets the number of prompt tokens evaluated.

Property Position

Gets the zero-based position whose logits selected this token.

Property SelectedLogit

Gets the selected token logit.

LlamaReferenceSessionUAIX.LmRuntime.Models.Llama 17 members

Executes a deterministic, scalar-first LLaMA forward path for tiny correctness fixtures.

This class is the numerical correctness anchor for later optimized kernels. It is intentionally limited to batch size one and F32, Q8_0, or Q4_0 mapped or array-backed weights. It performs no governance or adaptive policy operations and therefore belongs exclusively to deterministic parity mode.

Method

LlamaReferenceSession(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.LlamaReferenceModelWeights,UAIX.LmRuntime.Tokenization.IGgufTokenizer)

Initializes a reference session through the v1.8.0 array-backed compatibility path.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.
weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.
tokenizer: The optional tokenizer used only to decode the selected token text.

Method

LlamaReferenceSession(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.ILlamaModelWeightSource,UAIX.LmRuntime.Tokenization.IGgufTokenizer)

Initializes a reference session over immutable array-backed or direct mapped weight sources.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.
weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.
tokenizer: The optional tokenizer used only to decode the selected token text.

Property Position

Gets the next sequence position to be evaluated.

Property KvCache

Gets the typed key/value cache owned by this session.

Property WeightSource

Gets the immutable model weight source used by this session.

Property VocabularySize

Gets the configured vocabulary size.

Property ContextCapacity

Gets the configured sequence capacity.

Method Reset

Clears sequence state and all key/value cache contents.

Method CaptureState

Captures complete deterministic session state without serializing live model pointers.

Returns: The LlamaReferenceSessionSnapshot result produced by LlamaReferenceSession.CaptureState for this contract: Captures complete deterministic session state without serializing live model pointers. It is published only after all documented validation and ownership transitions succeed.

Method RestoreState(UAIX.LmRuntime.Models.Llama.LlamaReferenceSessionSnapshot)

Restores complete deterministic state after validating sequence, vocabulary, and cache identities.

snapshot: The immutable state snapshot being validated, serialized, restored, or analyzed without retaining caller-owned mutable aliases.

Method RunStep(int,System.Span<float>)

Evaluates one input token and writes next-token logits.

tokenId: The token identifier to process; it must fall within the validated vocabulary and operation-specific range.
logits: The destination buffer with at least vocabulary-size elements.

Method DecodeOneGreedy(System.Collections.Generic.IReadOnlyList<int>,bool)

Evaluates a prompt and returns exactly one greedily selected next token.

promptTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
resetSession: Whether existing key/value state should be cleared first.

Returns: The LlamaGreedyTokenResult result produced by LlamaReferenceSession.DecodeOneGreedy for this contract: Evaluates a prompt and returns exactly one greedily selected next token. It is published only after all documented validation and ownership transitions succeed.

Method Prefill(System.Collections.Generic.IReadOnlyList<int>,bool)

Evaluates every prompt token and leaves the final logits available for deterministic selection.

promptTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
resetSession: Whether existing sequence and cache state should be cleared first.

Method CopyLastLogitsTo(System.Span<float>)

Copies the most recently computed logits to a caller-provided destination.

destination: The destination with room for the configured vocabulary.

Method

GenerateGreedy(System.Collections.Generic.IReadOnlyList<int>,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,System.Threading.CancellationToken)

Generates deterministic greedy token identifiers into caller-owned buffers.

promptTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
generatedTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
finalLogits: The caller-owned destination for the final available logits.
options: The optional LlamaGreedyGenerationOptions controlling GenerateGreedy; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.
cancellationToken: The caller-provided token used to cancel the operation before additional work or results are published.

Returns: The LlamaGreedyGenerationResult result produced by LlamaReferenceSession.GenerateGreedy for this contract: Generates deterministic greedy token identifiers into caller-owned buffers. It is published only after all documented validation and ownership transitions succeed.

Method

GenerateGreedy(System.Collections.Generic.IReadOnlyList<int>,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,System.Action<UAIX.LmRuntime.Models.Llama.LlamaGeneratedToken>,System.Threading.CancellationToken)

Generates deterministic greedy token identifiers and reports each selection to a synchronous observer.

promptTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
generatedTokenIds: The ordered token identifiers to process; sequence order is preserved and each identifier is validated where required.
finalLogits: The caller-owned destination for the final available logits.
options: The optional LlamaGreedyGenerationOptions controlling GenerateGreedy; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.
tokenObserver: The optional synchronous observer invoked once after each token is selected.
cancellationToken: A token observed before selection and before the next committed model step.

Returns: The LlamaGreedyGenerationResult result produced by LlamaReferenceSession.GenerateGreedy for this contract: Generates deterministic greedy token identifiers and reports each selection to a synchronous observer. It is published only after all documented validation and ownership transitions succeed.

Method SelectGreedyToken(int)

Selects and decodes one greedy token from the current logits.

promptTokenCount: The number of prompt tokens associated with the current logits.

Returns: The LlamaGreedyTokenResult result produced by LlamaReferenceSession.SelectGreedyToken for this contract: Selects and decodes one greedy token from the current logits. It is published only after all documented validation and ownership transitions succeed.

LlamaReferenceFixtureUAIX.LmRuntime.Models.Llama 5 members

Represents a deterministic tiny reference fixture with one transformer block.

Property Configuration

Gets the fixture model configuration.

Property Weights

Gets the fixture model weights.

Property Tokenizer

Gets the fixture tokenizer.

Property PromptTokenIds

Gets the canonical fixture prompt tokens.

Method CreateSession

Creates the session from the validated inputs required by LlamaReferenceFixture.

Returns: A session with empty key/value cache state.

LlamaReferenceFixtureFactoryUAIX.LmRuntime.Models.Llama 1 member

Creates deterministic tiny fixtures used by reference-runtime tests and examples.

Method CreateDeterministic

Creates a one-block, five-token deterministic LLaMA fixture.

Returns: The fixture configuration, weights, tokenizer, and prompt, with ownership and disposal obligations defined by the returned type and the CreateDeterministic contract.

ILlamaSessionUAIX.LmRuntime.Models.Llama 1 member

Defines the lifecycle for a LLaMA-family inference session.

Method DecodeAsync(int,System.Threading.CancellationToken)

Decodes the next token for the active sequence.

tokenId: The token identifier to process; it must fall within the validated vocabulary and operation-specific range.
cancellationToken: The caller-provided token used to cancel the operation before additional work or results are published.

Returns: An asynchronous ValueTask<int> that completes with the result of ILlamaSession.DecodeAsync: Decodes the next token for the active sequence. Fault and cancellation states are propagated without a successful partial result.

LlamaReferenceExecutorUAIX.LmRuntime.Models.Llama 1 member

Provides scalar/reference execution anchors for LLaMA-family graphs.

Method Forward(System.ReadOnlySpan<float>,UAIX.LmRuntime.Models.Llama.LlamaWeights,System.Span<float>)

Executes a minimal reference forward pass over hidden-state logits.

hiddenState: The hidden state sequence used by this operation; its required length, ordering, and element bounds are validated before access.
weights: The LLaMA weights used by the reference path.
logits: The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.

LlamaWeightsUAIX.LmRuntime.Models.Llama 2 members

Represents model-level LLaMA weights used by reference execution.

Property TokenEmbeddings

Gets token embedding weights.

Property OutputProjection

Gets the output projection matrix in row-major order.

LlamaLayerWeightsUAIX.LmRuntime.Models.Llama 3 members

Represents one transformer block's reference weights.

Property AttentionQuery

Gets the attention query projection matrix.

Property AttentionKey

Gets the attention key projection matrix.

Property AttentionValue

Gets the attention value projection matrix.

LlamaReferenceRmsNormUAIX.LmRuntime.Models.Llama 1 member

Provides reference RMSNorm behavior.

Method Apply(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>,float)

Applies the supplied input to the supplied values while preserving the operation's numeric and shape invariants.

input: The source data consumed by the operation; caller-owned storage is not retained after the method returns.
weights: The weights sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.
epsilon: The positive normalization epsilon added to the mean-square term to avoid division by zero while preserving deterministic numerical behavior.

LlamaReferenceRopeUAIX.LmRuntime.Models.Llama 1 member

Provides reference RoPE behavior.

Method Apply(System.Span<float>,int,float)

Applies rotary position embedding to adjacent hidden-state pairs.

values: The values sequence used by this operation; its required length, ordering, and element bounds are validated before access.
position: The zero-based sequence or cache position addressed by the operation; it must lie within the allocated context and readable or writable range.
theta: The rotary angle in radians applied to the paired vector components at the addressed position.

LlamaReferenceAttentionUAIX.LmRuntime.Models.Llama 1 member

Provides reference causal attention behavior.

Method ApplyCausal(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,int,System.Span<float>)

Applies a minimal causal attention score computation.

query: The query sequence used by this operation; its required length, ordering, and element bounds are validated before access.
keys: The keys sequence used by this operation; its required length, ordering, and element bounds are validated before access.
values: The values sequence used by this operation; its required length, ordering, and element bounds are validated before access.
headSize: The numeric head size consumed by ApplyCausal; it must satisfy the member's documented range, geometry, and finite-value requirements.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

GroupedQueryAttentionMapUAIX.LmRuntime.Models.Llama 1 member

Maps query heads to grouped key/value heads.

Method MapHead(int,int,int)

Maps an attention query head to the corresponding KV head.

queryHead: The zero-based query-head index mapped deterministically to its corresponding key/value head.
queryHeadCount: The query head count used to bound this operation; it must be nonnegative and within the supported range.
keyValueHeadCount: The key value head count used to bound this operation; it must be nonnegative and within the supported range.

Returns: The int value computed by GroupedQueryAttentionMap.MapHead for this contract: Maps an attention query head to the corresponding KV head. Range, finite-value, and overflow checks are completed before the value is returned.

LlamaSwiGluReferenceUAIX.LmRuntime.Models.Llama 1 member

Provides reference SwiGLU behavior.

Method Apply(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>)

Applies the SwiGLU activation to validated gate and up-projection vectors.

gate: The gate sequence used by this operation; its required length, ordering, and element bounds are validated before access.
up: The up sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

LlamaLogitComputerUAIX.LmRuntime.Models.Llama 1 member

Computes reference logits from a hidden state and output projection.

Method ComputeLogits(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>)

Computes logits from a hidden vector and a row-major projection matrix.

hiddenState: The hidden state sequence used by this operation; its required length, ordering, and element bounds are validated before access.
projection: The projection sequence used by this operation; its required length, ordering, and element bounds are validated before access.
logits: The logits sequence used by this operation; its required length, ordering, and element bounds are validated before access.

LlamaParityToleranceUAIX.LmRuntime.Models.Llama 3 members

Configures exact token and explicit floating-point tolerance checks for cross-storage parity.

Property AbsoluteTolerance

Gets the absolute per-logit tolerance.

Property RelativeTolerance

Gets the relative per-logit tolerance.

Method Validate

Validates the absolute and relative parity tolerances used for numerical comparison.

LlamaLogitComparisonUAIX.LmRuntime.Models.Llama 6 members

Summarizes a deterministic comparison of two next-token logit vectors.

Property IsWithinTolerance

Gets whether every compared logit satisfies the configured tolerance.

Property MaximumAbsoluteError

Gets the largest absolute logit difference.

Property MeanAbsoluteError

Gets the arithmetic mean absolute logit difference.

Property FirstFailingIndex

Gets the first failing logit index, or when none failed.

Property FirstFailingReferenceValue

Gets the reference value at the first failing index.

Property FirstFailingCandidateValue

Gets the candidate value at the first failing index.

LlamaLogitComparatorUAIX.LmRuntime.Models.Llama 1 member

Compares deterministic next-token vectors without widening caller-provided tolerances.

Method

Compare(System.Collections.Generic.IReadOnlyList<float>,System.Collections.Generic.IReadOnlyList<float>,UAIX.LmRuntime.Models.Llama.LlamaParityTolerance)

Compares two logit vectors using absolute-or-relative error acceptance.

reference: The reference sequence used by this operation; its required length, ordering, and element bounds are validated before access.
candidate: The candidate sequence used by this operation; its required length, ordering, and element bounds are validated before access.
tolerance: The tolerance input of type LlamaParityTolerance read by LlamaLogitComparator.Compare; it must satisfy the member-specific nullability, identity, range, and ownership rules before dependent work begins.

Returns: The LlamaLogitComparison result produced by LlamaLogitComparator.Compare for this contract: Compares two logit vectors using absolute-or-relative error acceptance. It is published only after all documented validation and ownership transitions succeed.

LlamaStorageParityCandidateResultUAIX.LmRuntime.Models.Llama 7 members

Represents one candidate model's parity result against a selected reference model.

Property ModelPath

Gets the candidate model path.

Property ModelSha256

Gets the candidate model SHA-256.

Property StorageSummary

Gets the candidate storage summary.

Property TokenMatches

Gets whether the selected token identifier exactly equals the reference identifier.

Property LogitComparison

Gets the detailed logit comparison.

Property OneTokenResult

Gets the complete candidate one-token result.

Property Passed

Gets whether both exact-token and floating-point contracts passed.

LlamaStorageParityResultUAIX.LmRuntime.Models.Llama 4 members

Represents a cross-storage one-token parity run.

Property Prompt

Gets the prompt used for every model.

Property ReferenceResult

Gets the reference one-token result.

Property Candidates

Gets candidate results in caller order.

Property Passed

Gets whether every candidate passed the explicit parity contract.

LlamaStorageParityRunnerUAIX.LmRuntime.Models.Llama 1 member

Executes bounded offline one-token parity comparisons across local GGUF storage variants.

Method Run(string,System.Collections.Generic.IReadOnlyList<string>,string,UAIX.LmRuntime.Models.Llama.LlamaParityTolerance)

Runs one reference model and one or more candidate models with identical prompt settings.

referenceModelPath: The local file-system reference model path processed by this operation; it must satisfy the containing component's path and scope policy.
candidateModelPaths: The local file-system candidate model paths processed by this operation; it must satisfy the containing component's path and scope policy.
prompt: The prompt processed by the configured encoding or normalization rules; it must satisfy the declared nullability contract.
tolerance: The tolerance input of type LlamaParityTolerance read by LlamaStorageParityRunner.Run; it must satisfy the member-specific nullability, identity, range, and ownership rules before dependent work begins.

Returns: The LlamaStorageParityResult result produced by LlamaStorageParityRunner.Run for this contract: Runs one reference model and one or more candidate models with identical prompt settings. It is published only after all documented validation and ownership transitions succeed.

LlamaTensorRoleUAIX.LmRuntime.Models.Llama 12 members

Identifies semantic roles for LLaMA-family tensors.

Field TokenEmbedding

Token embedding table.

Field OutputNorm

Final output normalization scale.

Field Output

Output projection matrix.

Field AttentionNorm

Per-block attention normalization scale.

Field AttentionQuery

Per-block query projection.

Field AttentionKey

Per-block key projection.

Field AttentionValue

Per-block value projection.

Field AttentionOutput

Per-block attention output projection.

Field FeedForwardNorm

Per-block feed-forward normalization scale.

Field FeedForwardGate

Per-block feed-forward gate projection.

Field FeedForwardUp

Per-block feed-forward up projection.

Field FeedForwardDown

Per-block feed-forward down projection.

TensorBindingStorageKindUAIX.LmRuntime.Models.Llama 2 members

Identifies where a bound tensor payload is stored.

Field MemoryMappedFile

The tensor remains in the GGUF memory-mapped artifact.

Field Alias

The tensor is an alias of another bound tensor.

TensorBindingOwnershipUAIX.LmRuntime.Models.Llama 2 members

Identifies ownership for a bound tensor payload.

Field BorrowedModelStorage

The binding borrows storage owned by the loaded model.

Field BorrowedAlias

The binding borrows storage through another tensor binding.

TensorBindingOptionsUAIX.LmRuntime.Models.Llama 4 members

Configures semantic validation performed by .

Property AllowTiedOutputProjection

Gets whether a missing output.weight may alias token_embd.weight.

Property ValidateSemanticShapes

Gets whether dimensions derived from model metadata must match the GGUF storage shape.

Property ValidateByteLengths

Gets whether physical byte lengths must match the registered tensor type traits.

Property ValidateFileBounds

Gets whether tensor ranges must fit inside the parsed source file length when available.

LlamaTensorRequirementUAIX.LmRuntime.Models.Llama 7 members

Describes one required LLaMA tensor contract.

Property Name

Gets the required tensor name.

Property Role

Gets the tensor role.

Property ExpectedRank

Gets the expected rank.

Property ExpectedStorageDimensions

Gets dimensions in GGUF storage order, where dimension zero is the row width.

Property ExpectedLogicalDimensions

Gets dimensions in logical row-major order for diagnostics and manifests.

Property BlockIndex

Gets the optional block index.

Property IsOptional

Gets whether the tensor may be satisfied by an explicit alias rule.

TensorBindingEntryUAIX.LmRuntime.Models.Llama 10 members

Represents one bound tensor entry.

Property Requirement

Gets the tensor requirement.

Property Descriptor

Gets the GGUF tensor descriptor supplying storage.

Property SourceTensorName

Gets the source tensor name when this binding is an alias.

Property LogicalDimensions

Gets the normalized logical dimensions.

Property ByteLength

Gets the physical storage byte length.

Property AbsoluteOffset

Gets the absolute source-file offset.

Property DataType

Gets the mapped runtime data type.

Property StorageKind

Gets the storage kind.

Property Ownership

Gets the ownership contract.

Property IsAlias

Gets whether this binding aliases another tensor.

TensorBindingDiagnosticUAIX.LmRuntime.Models.Llama 4 members

Represents a tensor binding diagnostic.

Property Code

Gets the diagnostic code.

Property TensorName

Gets the tensor name associated with the diagnostic.

Property BlockIndex

Gets the optional transformer block index.

Property Message

Gets the diagnostic message.

TensorBindingManifestUAIX.LmRuntime.Models.Llama 5 members

Represents the result of LLaMA tensor binding.

Property Bindings

Gets bound tensor entries.

Property Diagnostics

Gets binding diagnostics.

Property IsComplete

Gets a value indicating whether every required tensor was bound without diagnostics.

Method

TryGetBinding(UAIX.LmRuntime.Models.Llama.LlamaTensorRole,System.Nullable<int>,UAIX.LmRuntime.Models.Llama.TensorBindingEntry&)

Attempts to find one bound tensor by semantic role and optional block index.

role: The semantic LLaMA tensor role used to select the required bound tensor from the validated manifest.
blockIndex: The zero-based block index; it must identify an existing position within the relevant validated range.
entry: When the method returns, contains the entry produced by the operation when successful; otherwise contains the type's default value.

Returns: True when try get binding succeeds for the supplied values; otherwise, false.

Method ThrowIfIncomplete

Throws when the manifest contains one or more diagnostics.

TensorBindingExceptionUAIX.LmRuntime.Models.Llama 2 members

Represents a failed LLaMA tensor schema binding operation.

Method TensorBindingException(UAIX.LmRuntime.Models.Llama.TensorBindingManifest)

Initializes a binding exception from a failed manifest.

manifest: The validated manifest that binds tensor requirements, model identity, and storage diagnostics used by the operation.

Property Manifest

Gets the failed binding manifest.

LlamaRequiredTensorRegistryUAIX.LmRuntime.Models.Llama 1 member

Builds the required LLaMA-family tensor registry from model configuration.

Method Build(UAIX.LmRuntime.Models.Llama.LlamaModelConfig)

Creates the required tensor list for the configuration.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.

Returns: An ordered read-only IReadOnlyList<LlamaTensorRequirement> result from LlamaRequiredTensorRegistry.Build: Creates the required tensor list for the configuration. Mutable internal collection aliases are not exposed through the returned contract.

LlamaTensorBinderUAIX.LmRuntime.Models.Llama 2 members

Binds and validates LLaMA-family GGUF tensors as a schema-validation phase.

Method Bind(UAIX.LmRuntime.Gguf.GgufModel,UAIX.LmRuntime.Models.Llama.LlamaModelConfig)

Binds required tensors from a parsed GGUF artifact using default validation options.

model: The parsed GGUF model whose validated metadata and tensor catalog are consumed by this operation.
config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.

Returns: The TensorBindingManifest result produced by LlamaTensorBinder.Bind for this contract: Binds required tensors from a parsed GGUF artifact using default validation options. It is published only after all documented validation and ownership transitions succeed.

Method

Bind(UAIX.LmRuntime.Gguf.GgufModel,UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.TensorBindingOptions)

Binds required tensors from a parsed GGUF artifact.

model: The parsed GGUF model whose validated metadata and tensor catalog are consumed by this operation.
config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.
options: The optional TensorBindingOptions controlling Bind; null selects the documented defaults, supplied limits are validated before allocation, and the instance is not mutated.

Returns: The TensorBindingManifest result produced by LlamaTensorBinder.Bind for this contract: Binds required tensors from a parsed GGUF artifact. It is published only after all documented validation and ownership transitions succeed.

MappedFloat16VectorSourceUAIX.LmRuntime.Models.Llama 6 members

Reads an IEEE float16 vector directly from a mapped GGUF tensor view.

Method MappedFloat16VectorSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedFloat16VectorSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property Length

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyTo(System.Span<float>)

Copies the to into caller-owned storage after validating the requested range and capacity.

destination: The destination buffer that receives the produced values.

MappedBFloat16VectorSourceUAIX.LmRuntime.Models.Llama 6 members

Reads a brain-float16 vector directly from a mapped GGUF tensor view.

Method MappedBFloat16VectorSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedBFloat16VectorSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property Length

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyTo(System.Span<float>)

Copies the to into caller-owned storage after validating the requested range and capacity.

destination: The destination buffer that receives the produced values.

MappedFloat16MatrixSourceUAIX.LmRuntime.Models.Llama 8 members

Reads and multiplies an IEEE float16 matrix directly from a mapped GGUF tensor view.

Method MappedFloat16MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedFloat16MatrixSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

MappedBFloat16MatrixSourceUAIX.LmRuntime.Models.Llama 8 members

Reads and multiplies a brain-float16 matrix directly from a mapped GGUF tensor view.

Method MappedBFloat16MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedBFloat16MatrixSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

MappedQ4_KMatrixSourceUAIX.LmRuntime.Models.Llama 8 members

Reads and multiplies a Q4_K matrix directly from a mapped GGUF tensor view.

Method MappedQ4_KMatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedQ4_KMatrixSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

MappedQ6_KMatrixSourceUAIX.LmRuntime.Models.Llama 8 members

Reads and multiplies a Q6_K matrix directly from a mapped GGUF tensor view.

Method MappedQ6_KMatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedQ6_KMatrixSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

MappedVectorSourceFactoryUAIX.LmRuntime.Models.Llama 1 member

Selects a mapped scalar vector implementation from GGML storage metadata.

Method Create(UAIX.LmRuntime.Gguf.MappedTensorView)

Creates the read only vector source from the validated inputs required by MappedVectorSourceFactory.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Returns: The storage-specific vector source, with ownership and disposal obligations defined by the returned type and the Create contract.

LlamaGenerationStopReasonUAIX.LmRuntime.Models.Llama 5 members

Identifies why deterministic greedy generation stopped.

Field MaximumTokens

The requested maximum number of tokens was produced.

Field EndOfSequence

The configured end-of-sequence token was selected.

Field StopToken

A caller-configured stop token was selected.

Field ContextCapacity

The model context window could not accept another evaluated token.

Field Cancelled

Cooperative cancellation was observed between committed inference steps.

LlamaGreedyGenerationOptionsUAIX.LmRuntime.Models.Llama 4 members

Defines allocation-bounded deterministic greedy generation controls.

Property MaximumTokens

Gets the maximum number of generated tokens.

Property ResetSession

Gets whether the session is reset before prompt prefill.

Property EndOfSequenceTokenId

Gets the optional end-of-sequence token identifier.

Property StopTokenIds

Gets additional token identifiers that terminate generation after being emitted.

LlamaGeneratedTokenUAIX.LmRuntime.Models.Llama 4 members

Describes one token selected during deterministic greedy generation.

The value contains only a zero-based sequence number, token identifier, and selected logit. It does not contain prompt text, decoded output, model bytes, file paths, persistent state, or provider information.

Method LlamaGeneratedToken(int,int,float)

Initializes a new LlamaGeneratedToken instance with validated dependencies and operational bounds.

sequence: The zero-based selection sequence within the current generation operation.
tokenId: The token identifier to process; it must fall within the validated vocabulary and operation-specific range.
selectedLogit: The deterministic selected logit at this generation sequence position. NaN is rejected; infinities remain valid because compares them using the same deterministic ordering as finite values.

Property Sequence

Gets the zero-based token-selection sequence.

Property TokenId

Gets the selected model vocabulary identifier.

Property SelectedLogit

Gets the selected token's deterministic argmax logit.

LlamaGreedyGenerationResultUAIX.LmRuntime.Models.Llama 5 members

Describes an allocation-bounded greedy generation operation.

Property PromptTokenCount

Gets the number of prompt tokens evaluated for this operation.

Property GeneratedTokenCount

Gets the number of generated token identifiers written to the caller buffer.

Property StopReason

Gets the deterministic stop reason.

Property Position

Gets the next sequence position maintained by the session.

Property FinalSelectedLogit

Gets the selected logit of the final generated token, or negative infinity when none was generated.

RealModelSmokeStageUAIX.LmRuntime.Models.Llama 4 members

Identifies the deepest stage requested from the local real-model smoke workflow.

Field ParseOnly

Parses and validates the GGUF container only.

Field Tokenizer

Also constructs and validates the metadata-driven tokenizer.

Field TensorBinding

Also reconstructs LLaMA geometry and validates required tensor bindings.

Field OneToken

Also executes one deterministic greedy token when every storage contract is supported.

RealModelSmokeOptionsUAIX.LmRuntime.Models.Llama 15 members

Configures an explicitly local, opt-in GGUF smoke inspection.

Property ModelPath

Gets the local GGUF path.

Property AllowedRoot

Gets an optional root that the resolved model path must remain under.

Property MaximumFileByteCount

Gets an optional explicit maximum file length; zero disables this limit.

Property ComputeModelSha256

Gets whether the complete model SHA-256 should be computed.

Property Stage

Gets the deepest smoke stage to execute.

Property Prompt

Gets the prompt used by the one-token stage.

Property ExpectedTokenIdsPath

Gets an optional local JSON file containing expected prompt token identifiers.

Property ExpectedOneTokenPath

Gets an optional local JSON file containing the expected one-token result.

Property RequireEnvironmentGate

Gets whether the explicit environment gate is required.

Property PackageVersion

Gets the package version recorded in evidence.

Property CommitIdentity

Gets a commit or source identity supplied by the operator.

Property ProvenanceLabel

Gets an operator-supplied provenance label.

Property LicenseReviewStatus

Gets the operator-supplied license review status.

Property RedactModelPath

Gets whether the artifact model path is reduced to its file name.

Property EnvironmentGateName

Gets the environment variable that enables real-model execution.

RealModelSmokeStageEvidenceUAIX.LmRuntime.Models.Llama 3 members

Records one real-model workflow stage duration and current-thread allocation delta.

Property Stage

Gets the stage name.

Property ElapsedStopwatchTicks

Gets elapsed stopwatch ticks.

Property ManagedAllocatedByteCount

Gets managed bytes allocated on the measuring thread.

RealModelSmokeArtifactUAIX.LmRuntime.Models.Llama 29 members

Represents a versioned, machine-readable real-model smoke artifact.

Property Schema

Gets the artifact schema identifier.

Property PackageVersion

Gets the package version.

Property CommitIdentity

Gets the source/commit identity.

Property ProvenanceLabel

Gets the operator-supplied provenance label.

Property LicenseReviewStatus

Gets the operator-supplied license review status.

Property GeneratedUtc

Gets the generation time in UTC.

Property ClaimStatus

Gets the evidence claim status.

Property Succeeded

Gets whether the requested stage completed.

Property CompletedStage

Gets the deepest completed stage.

Property ModelPath

Gets the normalized local model path.

Property FileByteCount

Gets the model file length.

Property ModelSha256

Gets the optional complete-file SHA-256.

Property GgufVersion

Gets the parsed GGUF version.

Property Architecture

Gets the model architecture.

Property TokenizerFamily

Gets the tokenizer family.

Property StorageTypeCounts

Gets physical tensor counts by GGML storage name.

Property BindingDiagnostics

Gets binding diagnostic messages.

Property PromptTokenIds

Gets exact prompt token identifiers when tokenization completed.

Property SelectedTokenId

Gets the selected one-token identifier when execution completed.

Property SelectedTokenText

Gets the selected token text when execution completed.

Property ExpectedTokenIdsMatched

Gets whether the optional expected token-identifier evidence matched.

Property ExpectedOneTokenMatched

Gets whether the optional expected one-token evidence matched.

Property Alignment

Gets the effective GGUF tensor alignment.

Property PromptSha256

Gets the SHA-256 of the prompt text rather than requiring publication of the raw prompt.

Property StageEvidence

Gets stage timing and current-thread allocation measurements.

Property UnsupportedDiagnostics

Gets exact unsupported execution diagnostics.

Property CommandIdentity

Gets the non-secret command identity.

Property EnvironmentVariableNames

Gets environment-variable names used by the workflow without values.

Property Diagnostics

Gets bounded workflow diagnostics.

RealModelSmokeEnvironmentUAIX.LmRuntime.Models.Llama 1 member

Creates explicit local smoke options from the documented environment-variable contract.

Method Load(UAIX.LmRuntime.Models.Llama.RealModelSmokeStage)

Reads the local real-model smoke configuration from environment variables.

stage: The explicitly selected real-model smoke stage or stage implementation executed by the controlled pipeline.

Returns: The local smoke options, with ownership and disposal obligations defined by the returned type and the Load contract.

RealModelPathPolicyUAIX.LmRuntime.Models.Llama 1 member

Resolves local model paths under an optional root without following hidden network or download behavior.

Method Resolve(string,string,long)

Resolves and validates one local model path.

path: The local file-system path processed by this operation; it must satisfy the containing component's path and scope policy.
allowedRoot: The normalized caller-authorized directory boundary, or null when no containment root was configured for the operation.
maximumFileByteCount: The optional explicit file-size limit; zero disables it.

Returns: The text produced by RealModelPathPolicy.Resolve for this contract: Resolves and validates one local model path. The returned string is detached from mutable caller storage and is not persisted by the operation.

RealModelSmokeRunnerUAIX.LmRuntime.Models.Llama 1 member

Executes staged, offline real-model validation and emits a bounded evidence artifact.

Method Run(UAIX.LmRuntime.Models.Llama.RealModelSmokeOptions)

Runs the requested local smoke stages in their required order.

options: The explicit local smoke options and evidence boundaries.

Returns: A bounded machine-readable artifact describing the deepest completed stage.

ReferenceKvWriteBehaviorUAIX.LmRuntime.Models.Llama 1 member

Identifies the deterministic write semantics used by the scalar reference key/value cache.

Field AppendOrOverwrite

Writes append new positions and deterministically overwrite already written positions.

ReferenceKvCacheFingerprintUAIX.LmRuntime.Models.Llama 1 member

Computes stable fingerprints for model configurations that own reference key/value cache snapshots.

Method Create(UAIX.LmRuntime.Models.Llama.LlamaModelConfig)

Creates a SHA-256 fingerprint from the configuration fields that determine cache geometry and semantics.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.

Returns: The text produced by ReferenceKvCacheFingerprint.Create for this contract: Creates a SHA-256 fingerprint from the configuration fields that determine cache geometry and semantics. The returned string is detached from mutable caller storage and is not persisted by the operation.

IReferenceKvCacheUAIX.LmRuntime.Models.Llama 13 members

Defines a typed, deterministic key/value cache contract for the scalar LLaMA reference runtime.

Property LayerCount

Gets the number of transformer layers.

Property ContextLength

Gets the maximum sequence capacity.

Property KeyValueHeadCount

Gets the number of key/value heads per layer.

Property HeadWidth

Gets the float width of one key/value head.

Property UsedTokenCount

Gets the highest contiguous token position written plus one.

Property ConfigurationFingerprint

Gets the configuration fingerprint required by compatible snapshots.

Property WriteBehavior

Gets the deterministic append-versus-overwrite behavior.

Method Write(int,int,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>)

Appends or replaces one layer's key and value vectors at a sequence position.

layerIndex: The zero-based layer index; it must identify an existing position within the relevant validated range.
position: The zero-based sequence or cache position addressed by the operation; it must lie within the allocated context and readable or writable range.
key: The flattened key vector for all key/value heads.
value: The flattened value vector for all key/value heads.

Method GetKey(int,int,int)

Retrieves the key from the current cache state after validating the requested access.

layerIndex: The zero-based layer index; it must identify an existing position within the relevant validated range.
position: The zero-based sequence or cache position addressed by the operation; it must lie within the allocated context and readable or writable range.
headIndex: The zero-based head index; it must identify an existing position within the relevant validated range.

Returns: The bounded ReadOnlySpan<float> view produced by IReferenceKvCache.GetKey: Retrieves the key from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.

Method GetValue(int,int,int)

Retrieves the value from the current cache state after validating the requested access.

layerIndex: The zero-based layer index; it must identify an existing position within the relevant validated range.
position: The zero-based sequence or cache position addressed by the operation; it must lie within the allocated context and readable or writable range.
headIndex: The zero-based head index; it must identify an existing position within the relevant validated range.

Returns: The bounded ReadOnlySpan<float> view produced by IReferenceKvCache.GetValue: Retrieves the value from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.

Method Reset

Resets the requested state to its validated initial state without publishing partial state.

Method CreateSnapshot

Creates a bounded snapshot for tiny-fixture testing and replay.

Returns: The immutable cache snapshot, with ownership and disposal obligations defined by the returned type and the CreateSnapshot contract.

Method Restore(UAIX.LmRuntime.Models.Llama.ReferenceKvCacheSnapshot)

Restores the supplied snapshot from a validated persisted representation.

snapshot: The immutable state snapshot being validated, serialized, restored, or analyzed without retaining caller-owned mutable aliases.

ReferenceKvCacheSnapshotUAIX.LmRuntime.Models.Llama 9 members

Represents an immutable snapshot of a tiny reference key/value cache.

Property SchemaVersion

Gets the snapshot schema version.

Property ConfigurationFingerprint

Gets the model/configuration fingerprint.

Property LayerCount

Gets the number of layers in the snapshot.

Property ContextLength

Gets the context capacity in the snapshot.

Property KeyValueHeadCount

Gets the key/value head count.

Property HeadWidth

Gets the per-head width.

Property UsedTokenCount

Gets the used token count.

Property Keys

Gets a copy of all key values.

Property Values

Gets a copy of all value values.

ReferenceKvCacheDiagnosticSnapshotUAIX.LmRuntime.Models.Llama 3 members

Represents a bounded, non-mutable diagnostic view of reference cache state.

Property ConfigurationFingerprint

Gets the configuration fingerprint.

Property UsedTokenCount

Gets the used token count.

Property ContentSha256

Gets the SHA-256 of the used key/value prefix.

ReferenceKvCacheUAIX.LmRuntime.Models.Llama 16 members

Stores reference key/value state in two contiguous arrays without per-token dictionaries.

Method ReferenceKvCache(int,int,int,int)

Initializes a reference key/value cache with a geometry-derived compatibility fingerprint.

layerCount: The layer count used to bound this operation; it must be nonnegative and within the supported range.
contextLength: The context length that supplies session-scoped identity and boundary state; it is validated before dependent work begins.
keyValueHeadCount: The number of key/value heads per layer.
headWidth: The positive number of scalar values stored for each attention head in one cache position.

Method ReferenceKvCache(int,int,int,int,string)

Initializes a reference key/value cache with an explicit model/configuration fingerprint.

layerCount: The layer count used to bound this operation; it must be nonnegative and within the supported range.
contextLength: The context length that supplies session-scoped identity and boundary state; it is validated before dependent work begins.
keyValueHeadCount: The number of key/value heads per layer.
headWidth: The positive number of scalar values stored for each attention head in one cache position.
configurationFingerprint: The configuration fingerprint text consumed by ReferenceKvCache.ReferenceKvCache; null, emptiness, length, encoding, identifier, or path rules are enforced as documented, and the value is not persisted by this operation.

Property LayerCount

Property ContextLength

Property KeyValueHeadCount

Property HeadWidth

Property UsedTokenCount

Property ConfigurationFingerprint

Property WriteBehavior

Method Write(int,int,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>)

Writes the supplied layer index to the current cache state using the component's canonical representation.

layerIndex: The zero-based layer index; it must identify an existing position within the relevant validated range.
position: The zero-based sequence or cache position addressed by the operation; it must lie within the allocated context and readable or writable range.
key: The key sequence used by this operation; its required length, ordering, and element bounds are validated before access.
value: The value sequence used by this operation; its required length, ordering, and element bounds are validated before access.

Method GetKey(int,int,int)

Retrieves the key from the current cache state after validating the requested access.

layerIndex: The zero-based layer index; it must identify an existing position within the relevant validated range.
position: The zero-based sequence or cache position addressed by the operation; it must lie within the allocated context and readable or writable range.
headIndex: The zero-based head index; it must identify an existing position within the relevant validated range.

Returns: The bounded ReadOnlySpan<float> view produced by ReferenceKvCache.GetKey: Retrieves the key from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.

Method GetValue(int,int,int)

Retrieves the value from the current cache state after validating the requested access.

layerIndex: The zero-based layer index; it must identify an existing position within the relevant validated range.
position: The zero-based sequence or cache position addressed by the operation; it must lie within the allocated context and readable or writable range.
headIndex: The zero-based head index; it must identify an existing position within the relevant validated range.

Returns: The bounded ReadOnlySpan<float> view produced by ReferenceKvCache.GetValue: Retrieves the value from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.

Method Reset

Resets the reference KV cache contents and logical sequence position to their initial state.

Method CreateSnapshot

Creates the snapshot from the validated inputs required by ReferenceKvCache.

Returns: The ReferenceKvCacheSnapshot result produced by ReferenceKvCache.CreateSnapshot for this contract: Creates the snapshot from the validated inputs required by ReferenceKvCache. It is published only after all documented validation and ownership transitions succeed.

Method CreateDiagnosticSnapshot

Creates a small diagnostic snapshot without exposing mutable key/value arrays.

Returns: The bounded diagnostic snapshot, with ownership and disposal obligations defined by the returned type and the CreateDiagnosticSnapshot contract.

Method Restore(UAIX.LmRuntime.Models.Llama.ReferenceKvCacheSnapshot)

Restores the supplied snapshot from a validated persisted representation.

snapshot: The immutable state snapshot being validated, serialized, restored, or analyzed without retaining caller-owned mutable aliases.

ReferenceKvPortableSnapshotUAIX.LmRuntime.Models.Llama 6 members

Carries a deterministic portable key/value-cache snapshot and its compatibility identities.

Property SchemaVersion

Gets the portable schema version.

Property ConfigurationFingerprint

Gets the model-configuration fingerprint.

Property ModelArtifactFingerprint

Gets the optional model-artifact fingerprint.

Property CacheLayoutFingerprint

Gets the cache-layout fingerprint.

Property ContentSha256

Gets the SHA-256 of the serialized bytes preceding the digest field.

Property Snapshot

Gets the restored capacity-shaped snapshot.

ReferenceKvCacheSerializerUAIX.LmRuntime.Models.Llama 5 members

Serializes only logically used key/value positions in stable layer-position-head order.

Schema version two is additive and does not change the in-memory version-one snapshot contract retained for source compatibility. Unused capacity is reconstructed as zero during deserialization.

Field SchemaVersion

Gets the portable snapshot schema version.

Field DefaultMaximumByteCount

Gets the default maximum serialized snapshot size.

Method Serialize(UAIX.LmRuntime.Models.Llama.ReferenceKvCacheSnapshot,string,string,int)

Serializes a bounded cache snapshot in deterministic little-endian form.

snapshot: The immutable state snapshot being validated, serialized, restored, or analyzed without retaining caller-owned mutable aliases.
modelArtifactFingerprint: The stable model-artifact fingerprint that binds the serialized state to the exact reviewed model identity.
cacheLayoutFingerprint: The stable cache-layout fingerprint used to reject state created for incompatible tensor geometry or storage layout.
maximumByteCount: The maximum byte count used to bound this operation; it must be nonnegative and within the supported range.

Returns: The serialized snapshot bytes including a trailing SHA-256.

Method Deserialize(System.ReadOnlySpan<byte>,int)

Deserializes and verifies a portable key/value-cache snapshot.

bytes: The bytes sequence used by this operation; its required length, ordering, and element bounds are validated before access.
maximumByteCount: The maximum byte count used to bound this operation; it must be nonnegative and within the supported range.

Returns: The ReferenceKvPortableSnapshot result produced by ReferenceKvCacheSerializer.Deserialize for this contract: Deserializes and verifies a portable key/value-cache snapshot. It is published only after all documented validation and ownership transitions succeed.

Method Restore(UAIX.LmRuntime.Models.Llama.ReferenceKvCache,System.ReadOnlySpan<byte>,string,string)

Restores verified portable bytes into a cache after validating model and layout identities.

cache: The validated ReferenceKvCache dependency consumed by Restore; ownership and lifetime remain with the caller unless this member explicitly documents a transfer.
bytes: The bytes sequence used by this operation; its required length, ordering, and element bounds are validated before access.
expectedModelArtifactFingerprint: The expected artifact identity, or an empty string to accept an empty serialized identity.
expectedCacheLayoutFingerprint: The required cache-layout fingerprint against which the serialized state is compared before restoration.

WeightSourceStorageDiagnosticsUAIX.LmRuntime.Models.Llama 7 members

Describes immutable storage used by one deterministic reference weight source.

Property TensorName

Gets the semantic tensor name.

Property StorageType

Gets the GGML physical storage type.

Property DataType

Gets the logical runtime data type.

Property ByteLength

Gets the physical byte length.

Property ManagedCopiedByteCount

Gets the number of bytes copied into persistent managed model-weight storage.

Property IsMemoryMapped

Gets a value indicating whether the source borrows memory-mapped storage.

Property IsAlias

Gets a value indicating whether this source aliases another semantic binding.

IReadOnlyVectorSourceUAIX.LmRuntime.Models.Llama 5 members

Exposes an immutable logical vector without requiring a particular storage representation.

Property Length

Gets the logical vector length.

Property DataType

Gets the logical runtime data type.

Property StorageType

Gets the physical GGML storage type.

Property StorageDiagnostics

Gets immutable storage diagnostics.

Method CopyTo(System.Span<float>)

Copies every vector value into a caller-owned float32 destination.

destination: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

IReadOnlyMatrixSourceUAIX.LmRuntime.Models.Llama 7 members

Exposes an immutable logical row-major matrix without requiring a particular storage representation.

Property RowCount

Gets the logical row count.

Property ColumnCount

Gets the logical column count.

Property DataType

Gets the logical runtime data type.

Property StorageType

Gets the physical GGML storage type.

Property StorageDiagnostics

Gets immutable storage diagnostics.

Method CopyRowTo(int,System.Span<float>)

Copies and, when required, dequantizes one logical row into a caller-owned float32 destination.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies this matrix by a float32 vector without materializing a complete float32 matrix.

vector: The input vector with at least ColumnCount values.
output: The output buffer with at least RowCount values.

ILlamaLayerWeightSourceUAIX.LmRuntime.Models.Llama 9 members

Exposes immutable weights required by one LLaMA transformer block.

Property AttentionNorm

Gets the attention normalization vector.

Property AttentionQuery

Gets the query projection matrix.

Property AttentionKey

Gets the key projection matrix.

Property AttentionValue

Gets the value projection matrix.

Property AttentionOutput

Gets the attention output projection matrix.

Property FeedForwardNorm

Gets the feed-forward normalization vector.

Property FeedForwardGate

Gets the feed-forward gate projection matrix.

Property FeedForwardUp

Gets the feed-forward up projection matrix.

Property FeedForwardDown

Gets the feed-forward down projection matrix.

ILlamaModelWeightSourceUAIX.LmRuntime.Models.Llama 8 members

Exposes immutable model weights required by the deterministic LLaMA reference session.

Property TokenEmbeddings

Gets the token embedding table.

Property Layers

Gets transformer-block weights in execution order.

Property OutputNorm

Gets the final output normalization vector.

Property OutputProjection

Gets the output projection matrix.

Property UsesTiedOutputProjection

Gets a value indicating whether output projection aliases token embeddings.

Property StorageDiagnostics

Gets storage diagnostics for every distinct semantic source.

Property StorageSummary

Gets a stable summary of physical storage types used by the model.

Property ManagedCopiedByteCount

Gets persistent managed model-weight bytes represented by this source.

ArrayVectorSourceUAIX.LmRuntime.Models.Llama 7 members

Provides an immutable array-backed vector adapter for compatibility and deterministic fixtures.

Method ArrayVectorSource(string,float[])

Initializes a new ArrayVectorSource instance with validated dependencies and operational bounds.

tensorName: The exact ordinal GGUF tensor catalog name used for lookup and diagnostics.
values: The values sequence used by this operation; its required length, ordering, and element bounds are validated before access.

Property TensorName

Gets the semantic tensor name.

Property Length

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyTo(System.Span<float>)

Copies the to into caller-owned storage after validating the requested range and capacity.

destination: The destination buffer that receives the produced values.

ArrayMatrixSourceUAIX.LmRuntime.Models.Llama 9 members

Provides an immutable row-major array-backed matrix adapter for compatibility and deterministic fixtures.

Method ArrayMatrixSource(string,float[],int,int)

Initializes a new ArrayMatrixSource instance with validated dependencies and operational bounds.

tensorName: The exact ordinal GGUF tensor catalog name used for lookup and diagnostics.
values: The values sequence used by this operation; its required length, ordering, and element bounds are validated before access.
rowCount: The row count used to bound this operation; it must be nonnegative and within the supported range.
columnCount: The column count used to bound this operation; it must be nonnegative and within the supported range.

Property TensorName

Gets the semantic tensor name.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

ArrayLlamaLayerWeightSourceUAIX.LmRuntime.Models.Llama 10 members

Provides one array-backed LLaMA layer weight source.

Method

ArrayLlamaLayerWeightSource(int,UAIX.LmRuntime.Models.Llama.LlamaReferenceLayerWeights,UAIX.LmRuntime.Models.Llama.LlamaModelConfig)

Initializes a new ArrayLlamaLayerWeightSource instance with validated dependencies and operational bounds.

blockIndex: The zero-based block index; it must identify an existing position within the relevant validated range.
weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.
config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.

Property AttentionNorm

Property AttentionQuery

Property AttentionKey

Property AttentionValue

Property AttentionOutput

Property FeedForwardNorm

Property FeedForwardGate

Property FeedForwardUp

Property FeedForwardDown

ArrayLlamaModelWeightSourceUAIX.LmRuntime.Models.Llama 10 members

Adapts the v1.8.0 float-array model to the storage-neutral v1.9.0 execution contracts.

Method Create(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.LlamaReferenceModelWeights)

Creates an array-backed source after validating its complete model contract.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.
weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.

Returns: The array-backed source, with ownership and disposal obligations defined by the returned type and the Create contract.

Method

ArrayLlamaModelWeightSource(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.LlamaReferenceModelWeights)

Initializes a new ArrayLlamaModelWeightSource instance with validated dependencies and operational bounds.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.
weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.

Property TokenEmbeddings

Property Layers

Property OutputNorm

Property OutputProjection

Property UsesTiedOutputProjection

Property StorageDiagnostics

Property ManagedCopiedByteCount

Property StorageSummary

MappedFloat32VectorSourceUAIX.LmRuntime.Models.Llama 6 members

Reads a float32 vector directly from a mapped GGUF tensor view.

Method MappedFloat32VectorSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedFloat32VectorSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property Length

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyTo(System.Span<float>)

Copies the to into caller-owned storage after validating the requested range and capacity.

destination: The destination buffer that receives the produced values.

MappedFloat32MatrixSourceUAIX.LmRuntime.Models.Llama 8 members

Reads and multiplies an F32 matrix directly from a mapped GGUF tensor view.

Method MappedFloat32MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedFloat32MatrixSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

MappedQ8_0MatrixSourceUAIX.LmRuntime.Models.Llama 8 members

Reads and multiplies a Q8_0 matrix directly from a mapped GGUF tensor view.

Method MappedQ8_0MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedQ8_0MatrixSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

MappedQ4_0MatrixSourceUAIX.LmRuntime.Models.Llama 8 members

Reads and multiplies a Q4_0 matrix directly from a mapped GGUF tensor view.

Method MappedQ4_0MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)

Initializes a new MappedQ4_0MatrixSource instance with validated dependencies and operational bounds.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Property RowCount

Property ColumnCount

Property DataType

Property StorageType

Property StorageDiagnostics

Method CopyRowTo(int,System.Span<float>)

Copies the row to into caller-owned storage after validating the requested range and capacity.

rowIndex: The zero-based row index; it must identify an existing position within the relevant validated range.
destination: The destination buffer that receives the produced values.

Method Multiply(System.ReadOnlySpan<float>,System.Span<float>)

Multiplies the supplied vector by the supplied vector without changing logical row order.

vector: The vector sequence used by this operation; its required length, ordering, and element bounds are validated before access.
output: The caller-owned destination buffer that receives the result; required capacity is validated before any write occurs.

MappedMatrixSourceFactoryUAIX.LmRuntime.Models.Llama 1 member

Creates supported matrix sources over mapped tensor views.

Method Create(UAIX.LmRuntime.Gguf.MappedTensorView)

Creates a direct mapped source for supported scalar and quantized storage.

view: The bounded tensor view whose descriptor, shape, byte order, and mapped payload are read without transferring ownership of the underlying mapping.

Returns: The storage-specific matrix source, with ownership and disposal obligations defined by the returned type and the Create contract.

MappedLlamaLayerWeightSourceUAIX.LmRuntime.Models.Llama 10 members

Exposes one mapped LLaMA transformer block through storage-neutral execution contracts.

Method MappedLlamaLayerWeightSource(UAIX.LmRuntime.Models.Llama.LlamaBoundLayerWeightSet)

Initializes a new MappedLlamaLayerWeightSource instance with validated dependencies and operational bounds.

weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.

Property AttentionNorm

Property AttentionQuery

Property AttentionKey

Property AttentionValue

Property AttentionOutput

Property FeedForwardNorm

Property FeedForwardGate

Property FeedForwardUp

Property FeedForwardDown

MappedLlamaModelWeightSourceUAIX.LmRuntime.Models.Llama 10 members

Exposes a complete mapped LLaMA model through storage-neutral execution contracts.

Method Create(UAIX.LmRuntime.Models.Llama.LlamaBoundWeightSet)

Creates and validates a complete mapped model weight source.

weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.

Returns: The validated mapped model weight source, with ownership and disposal obligations defined by the returned type and the Create contract.

Method MappedLlamaModelWeightSource(UAIX.LmRuntime.Models.Llama.LlamaBoundWeightSet)

Initializes a new MappedLlamaModelWeightSource instance with validated dependencies and operational bounds.

weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.

Property TokenEmbeddings

Property Layers

Property OutputNorm

Property OutputProjection

Property UsesTiedOutputProjection

Property StorageDiagnostics

Property ManagedCopiedByteCount

Gets the total number of persistent managed model-weight bytes copied by this source.

Property StorageSummary

LlamaWeightSourceValidatorUAIX.LmRuntime.Models.Llama 1 member

Validates storage-neutral LLaMA weight sources before deterministic execution begins.

Method Validate(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.ILlamaModelWeightSource)

Validates every global and block-local source against the configured model geometry.

config: The validated LLaMA model configuration defining context capacity, vocabulary size, tensor geometry, and attention dimensions for the operation.
weights: The validated model-weight source or bound weight set consumed read-only by the deterministic reference operation.

Frequently asked questions

What is the difference between mapped and materialized reference sessions?

A mapped session reads supported weight storage through mapped sources. A materialized session copies compatible weights into managed reference structures. The manifest and materialization records expose the chosen ownership and copied-byte behavior.

Does LlamaModelConfig.FromGguf prove the model will execute?

No. Configuration validation is one gate. Required tensors, storage support, tokenizer compatibility, binding, context limits, and a real execution stage must also pass.

Can session artifacts be restored with a different model?

They should not be. Use the model hash and configuration, tokenizer, and cache-layout fingerprints as strict compatibility checks, plus a bounded size and trusted path policy.

Is the reference path intended as a benchmark leader?

No. It is a legible correctness and parity anchor. Performance claims require retained measurements for the exact model, hardware, settings, and code path.