Architecture validation
LlamaModelConfig derives graph dimensions from GGUF metadata and validates head counts, dimensions, context, vocabulary, RoPE, and normalization requirements before execution.
UAIX.LmRuntime / Package guide
LLaMA configuration, tensor binding, mapped weights, reference sessions, KV cache, generation, and persistence.
Required For LLaMA graph/session internals
UAIX.LmRuntime.Models.Llama
LLaMA-family configuration, tensor binding, mapped weight sources, reference forward execution, sessions, KV cache, generation, persistence, and parity evidence.
LLaMA-family graph configuration and reference forward-pass primitives for pure C# local LLM runtime inference.
dotnet add package UAIX.LmRuntime.Models.Llama
<PackageReference Include="UAIX.LmRuntime.Models.Llama" />
Version policy: The documentation deliberately omits UAIX.LmRuntime package version numbers. Resolve and pin versions through your normal dependency-management and lock-file process.
LlamaModelConfig derives graph dimensions from GGUF metadata and validates head counts, dimensions, context, vocabulary, RoPE, and normalization requirements before execution.
Required tensor roles, storage kinds, ownership, diagnostics, and manifests make missing, duplicate, incompatible, or unexpectedly materialized weights observable.
Reference sessions own position, logits, and KV-cache state. Callers choose reset behavior and can capture, serialize, fingerprint, restore, or discard state under bounded policies.
These are the main public entry points. The generated reference below includes the documented public package surface.
LlamaModelConfig LlamaMappedModelLoader LlamaMappedModel LlamaMappedReferenceSession LlamaReferenceSession LlamaTensorBinder TensorBindingManifest ReferenceKvCache ReferenceKvCacheSerializer LlamaSessionArtifactSerializer LlamaStorageParityRunner RealModelSmokeRunner Examples use the documented public package surface. Paths, identities, runtime identifiers, device evidence, and application policy remain host inputs.
Separate container parsing from architecture-specific configuration checks.
using UAIX.LmRuntime.Gguf;
using UAIX.LmRuntime.Models.Llama;
GgufModel gguf = GgufReader.Read(
"models/model.gguf",
new GgufParseOptions());
LlamaModelConfig configuration =
LlamaModelConfig.FromGguf(gguf);
configuration.Validate();
Console.WriteLine($"Model: {configuration.ModelName}");
Console.WriteLine($"Layers: {configuration.BlockCount}");
Console.WriteLine($"Embedding: {configuration.EmbeddingLength}");
Console.WriteLine($"Heads: {configuration.AttentionHeadCount}");
Console.WriteLine($"KV heads: {configuration.AttentionKeyValueHeadCount}");
Console.WriteLine($"Context: {configuration.ContextLength}");
Use direct mapped execution for diagnostics, model validation, and deterministic one-token evidence.
using UAIX.LmRuntime.Models.Llama;
var loader = new LlamaMappedModelLoader();
using LlamaMappedModel model = loader.Load(
"models/model.gguf",
new LlamaMappedModelLoadOptions
{
RuntimeMode = LlamaRuntimeMode.DeterministicParity,
ComputeModelSha256 = true
});
using LlamaMappedReferenceSession session =
model.CreateReferenceSession();
LlamaMappedGreedyTokenResult result =
session.DecodeOneGreedy(
"Hello",
new LlamaOneTokenOptions
{
ResetSession = true,
ParseSpecialTokens = false,
AddSpecialTokens = true,
EmitTokenizerTrace = false
});
Console.WriteLine($"{result.TokenId}: {result.TokenText}");
Console.WriteLine($"Selected logit: {result.SelectedLogit}");
Console.WriteLine($"Position: {result.Position}");
Bound output allocation and observe each committed token.
using UAIX.LmRuntime.Models.Llama;
using UAIX.LmRuntime.Tokenization;
public static class MappedGenerationExample
{
/// <summary>
/// Generates greedy tokens into caller-owned buffers and observes each committed selection.
/// </summary>
/// <param name="model">The loaded mapped model that defines vocabulary capacity.</param>
/// <param name="session">The isolated mapped reference session.</param>
/// <param name="prompt">The prompt to tokenize and prefill.</param>
/// <param name="maximumTokens">The maximum number of output tokens.</param>
/// <param name="cancellationToken">A token observed between committed model steps.</param>
/// <returns>The bounded greedy-generation result.</returns>
public static LlamaGreedyGenerationResult Generate(
LlamaMappedModel model,
LlamaMappedReferenceSession session,
string prompt,
int maximumTokens,
CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(model);
ArgumentNullException.ThrowIfNull(session);
ArgumentException.ThrowIfNullOrWhiteSpace(prompt);
ArgumentOutOfRangeException.ThrowIfNegativeOrZero(maximumTokens);
int[] generatedTokenIds = new int[maximumTokens];
float[] finalLogits = new float[model.Configuration.VocabularySize];
return session.GenerateGreedy(
prompt,
generatedTokenIds,
finalLogits,
new LlamaGreedyGenerationOptions
{
MaximumTokens = maximumTokens,
ResetSession = true,
EndOfSequenceTokenId = null,
StopTokenIds = Array.Empty<int>()
},
new TokenizationOptions
{
AddSpecialTokens = true,
ParseSpecialTokens = false
},
token => Console.WriteLine(
$"{token.Sequence}: {token.TokenId} ({token.SelectedLogit})"),
cancellationToken);
}
}
Exercise reference execution without depending on an external model artifact.
using UAIX.LmRuntime.Models.Llama;
LlamaReferenceFixture fixture =
LlamaReferenceFixtureFactory.CreateDeterministic();
LlamaReferenceSession session = fixture.CreateSession();
LlamaGreedyTokenResult result = session.DecodeOneGreedy(
fixture.PromptTokenIds,
resetSession: true);
Console.WriteLine($"Token: {result.TokenId}");
Console.WriteLine($"Position: {result.Position}");
Bind persisted state to model, configuration, tokenizer, and cache-layout fingerprints, and enforce a maximum artifact size.
using UAIX.LmRuntime.Models.Llama;
public static class SessionPersistenceExample
{
/// <summary>
/// Saves a mapped reference session and immediately reloads the authenticated artifact.
/// </summary>
/// <param name="model">The mapped model that supplies model identity evidence.</param>
/// <param name="session">The session whose deterministic state will be persisted.</param>
/// <param name="statePath">The destination path for the session artifact.</param>
/// <param name="configurationFingerprint">The host-computed configuration fingerprint.</param>
/// <param name="tokenizerFingerprint">The host-computed tokenizer fingerprint.</param>
/// <param name="cacheLayoutFingerprint">The host-computed cache-layout fingerprint.</param>
/// <returns>The authenticated artifact loaded from disk.</returns>
public static LlamaSessionArtifact SaveAndReload(
LlamaMappedModel model,
LlamaMappedReferenceSession session,
string statePath,
string configurationFingerprint,
string tokenizerFingerprint,
string cacheLayoutFingerprint)
{
ArgumentNullException.ThrowIfNull(model);
ArgumentNullException.ThrowIfNull(session);
ArgumentException.ThrowIfNullOrWhiteSpace(statePath);
string? directory = Path.GetDirectoryName(
Path.GetFullPath(statePath));
if (!string.IsNullOrEmpty(directory))
{
Directory.CreateDirectory(directory);
}
var persistence = new LlamaSessionPersistenceOptions
{
ModelSha256 = model.Manifest.ModelSha256,
ConfigurationFingerprint = configurationFingerprint,
TokenizerFingerprint = tokenizerFingerprint,
CacheLayoutFingerprint = cacheLayoutFingerprint,
SamplerMode = "greedy",
GeneratedUtc = DateTimeOffset.UtcNow,
ClaimStatus = "local-evidence",
MaximumByteCount = 64 * 1024 * 1024
};
session.SaveState(statePath, persistence);
return session.LoadState(
statePath,
maximumByteCount: persistence.MaximumByteCount);
}
}
Boundary: The caller supplies and validates compatibility fingerprints; persisted state should be treated as model-bound untrusted input.
Expand a type to review its documented public fields, properties, constructors, methods, parameter descriptions, and return descriptions.
LlamaReferenceSessionSnapshotUAIX.LmRuntime.Models.Llama
5 members
Captures complete deterministic reference-session state without retaining live model pointers.
SchemaVersion
Gets the in-memory snapshot schema version.
Position
Gets the next sequence position.
TokenHistory
Gets committed input token identifiers in sequence order.
LastLogits
Gets the most recently computed logits.
KeyValueCache
Gets complete capacity-shaped key/value state.
LlamaSessionPersistenceOptionsUAIX.LmRuntime.Models.Llama
13 members
Configures digest-bound complete session serialization.
PackageVersion
Gets the package version that emitted the artifact.
MinimumCompatiblePackageVersion
Gets the oldest supported package version.
MaximumCompatiblePackageVersion
Gets the newest supported package version.
ModelSha256
Gets the complete model artifact SHA-256.
ConfigurationFingerprint
Gets the LLaMA configuration fingerprint.
TokenizerFingerprint
Gets the GGUF tokenizer fingerprint.
CacheLayoutFingerprint
Gets the persistent cache-layout identity.
SamplerMode
Gets the deterministic sampler mode.
EndOfSequenceTokenId
Gets the optional end-of-sequence token identifier.
StopTokenIds
Gets configured stop-token identifiers.
GeneratedUtc
Gets the UTC generation time.
ClaimStatus
Gets the evidence claim status.
MaximumByteCount
Gets the maximum accepted artifact byte count.
LlamaSessionArtifactUAIX.LmRuntime.Models.Llama
15 members
Carries verified complete deterministic session state and compatibility identities.
SchemaVersion
Gets the portable schema version.
PackageVersion
Gets the package version that emitted the artifact.
MinimumCompatiblePackageVersion
Gets the oldest supported package version.
MaximumCompatiblePackageVersion
Gets the newest supported package version.
ModelSha256
Gets the complete model artifact SHA-256.
ConfigurationFingerprint
Gets the model configuration fingerprint.
TokenizerFingerprint
Gets the tokenizer fingerprint.
CacheLayoutFingerprint
Gets the cache-layout fingerprint.
SamplerMode
Gets the sampler mode.
EndOfSequenceTokenId
Gets the optional end-of-sequence token identifier.
StopTokenIds
Gets configured stop-token identifiers.
GeneratedUtc
Gets the artifact generation time in UTC.
ClaimStatus
Gets the evidence claim status.
ContentSha256
Gets the SHA-256 of every serialized byte preceding the digest.
Snapshot
Gets the complete session snapshot.
LlamaSessionArtifactSerializerUAIX.LmRuntime.Models.Llama
5 members
Serializes complete deterministic reference-session state in bounded little-endian form.
SchemaVersion
Gets the supported artifact schema version.
Serialize(UAIX.LmRuntime.Models.Llama.LlamaReferenceSessionSnapshot,UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)
Serializes complete session state and appends a SHA-256 digest.
snapshotoptionsReturns: A newly allocated byte[] containing the ordered result of LlamaSessionArtifactSerializer.Serialize: Serializes complete session state and appends a SHA-256 digest. The caller owns the returned array and later mutation cannot alter the source object.
Deserialize(System.ReadOnlySpan<byte>,int)
Deserializes the llama session artifact from the validated persisted representation.
bytesmaximumByteCountReturns: The LlamaSessionArtifact result produced by LlamaSessionArtifactSerializer.Deserialize for this contract: Deserializes the llama session artifact from the validated persisted representation. It is published only after all documented validation and ownership transitions succeed.
Save(string,UAIX.LmRuntime.Models.Llama.LlamaReferenceSessionSnapshot,UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)
Writes a complete artifact to a local file.
pathsnapshotoptionsReturns: The LlamaSessionArtifact result produced by LlamaSessionArtifactSerializer.Save for this contract: Writes a complete artifact to a local file. It is published only after all documented validation and ownership transitions succeed.
Load(string,int)
Reads and verifies a complete artifact from a local file.
pathmaximumByteCountReturns: The verified artifact, with ownership and disposal obligations defined by the returned type and the Load contract.
FixtureVerificationDiagnosticUAIX.LmRuntime.Models.Llama
2 members
Represents one diagnostic emitted while verifying a checked-in GGUF fixture directory.
Code
Gets the stable diagnostic code.
Message
Gets the diagnostic message.
FixtureVerificationResultUAIX.LmRuntime.Models.Llama
5 members
Represents the result of bounded, offline fixture directory verification.
FixtureDirectory
Gets the normalized fixture directory.
ArtifactPath
Gets the normalized GGUF artifact path.
ArtifactSha256
Gets the verified SHA-256 digest.
Diagnostics
Gets verification diagnostics.
IsValid
Gets whether no verification diagnostics were emitted.
FixtureDirectoryVerifierUAIX.LmRuntime.Models.Llama
1 member
Verifies fixture manifests, artifact paths, digests, and basic loadability without network access.
Verify(string)
Verifies the supplied fixture directory and returns bounded evidence only after every required check succeeds.
fixtureDirectoryReturns: The FixtureVerificationResult result produced by FixtureDirectoryVerifier.Verify for this contract: Verifies the supplied fixture directory and returns bounded evidence only after every required check succeeds. It is published only after all documented validation and ownership transitions succeed.
LlamaWeightStorageModeUAIX.LmRuntime.Models.Llama
3 members
Identifies how a bound tensor participates in reference execution.
Mapped
The tensor remains a borrowed view over the mapped GGUF file.
Alias
The tensor aliases another mapped tensor.
CopiedForReference
The tensor was explicitly copied into a bounded float32 reference buffer.
LlamaBoundTensorUAIX.LmRuntime.Models.Llama
5 members
Represents one semantic LLaMA weight bound to mapped model storage.
Role
Gets the semantic tensor role.
BlockIndex
Gets the optional transformer block index.
Binding
Gets the validated binding manifest entry.
View
Gets the borrowed mapped tensor view.
StorageMode
Gets the storage mode represented by this binding.
LlamaBoundLayerWeightSetUAIX.LmRuntime.Models.Llama
10 members
Represents the mapped tensors required by one LLaMA transformer block.
BlockIndex
Gets the zero-based transformer block index.
AttentionNorm
Gets the attention normalization tensor.
AttentionQuery
Gets the query projection tensor.
AttentionKey
Gets the key projection tensor.
AttentionValue
Gets the value projection tensor.
AttentionOutput
Gets the attention output projection tensor.
FeedForwardNorm
Gets the feed-forward normalization tensor.
FeedForwardGate
Gets the feed-forward gate projection tensor.
FeedForwardUp
Gets the feed-forward up projection tensor.
FeedForwardDown
Gets the feed-forward down projection tensor.
LlamaReferenceMaterializationRecordUAIX.LmRuntime.Models.Llama
5 members
Records one explicit managed copy made for the bounded scalar reference runtime.
TensorName
Gets the source tensor name.
Role
Gets the semantic tensor role.
BlockIndex
Gets the optional transformer block index.
CopiedByteCount
Gets the copied byte count.
StorageMode
Gets the resulting storage mode.
LlamaReferenceWeightMaterializationUAIX.LmRuntime.Models.Llama
3 members
Contains immutable float32 weights and copy evidence for the scalar reference runtime.
Weights
Gets the immutable reference weights.
Records
Gets every bounded copy made while materializing the fixture.
TotalCopiedByteCount
Gets the total number of copied bytes.
LlamaBoundWeightSetUAIX.LmRuntime.Models.Llama
11 members
Resolves a complete LLaMA binding manifest into stable mapped tensor views.
This object does not own the operating-system mapping. Every view borrows storage from the supplied and becomes invalid when that mapping is disposed.
LlamaBoundWeightSet(UAIX.LmRuntime.Gguf.MappedGgufFile,UAIX.LmRuntime.Models.Llama.TensorBindingManifest,UAIX.LmRuntime.Models.Llama.LlamaModelConfig)
Initializes a mapped LLaMA weight set from a complete binding manifest.
mappingmanifestconfigMapping
Gets the mapping that owns all borrowed tensor bytes.
Configuration
Gets the validated model configuration.
Manifest
Gets the complete tensor binding manifest.
Bindings
Gets all semantic mapped tensor bindings.
TokenEmbeddings
Gets the token embedding tensor.
OutputNorm
Gets the final output normalization tensor.
Output
Gets the output projection tensor or tied embedding alias.
Layers
Gets the block-local mapped weight sets.
Get(UAIX.LmRuntime.Models.Llama.LlamaTensorRole,System.Nullable<int>)
Retrieves the llama bound tensor from the current LlamaBoundWeightSet state after validating the requested access.
roleblockIndexReturns: The LlamaBoundTensor result produced by LlamaBoundWeightSet.Get for this contract: Retrieves the llama bound tensor from the current LlamaBoundWeightSet state after validating the requested access. It is published only after all documented validation and ownership transitions succeed.
MaterializeFloat32ReferenceWeights(int)
Materializes bounded float32 arrays for the scalar correctness runtime.
maximumCopiedBytesReturns: The immutable reference weights and explicit copy ledger.
LlamaRuntimeModeUAIX.LmRuntime.Models.Llama
1 member
Identifies the deterministic execution contract used by a mapped model session.
DeterministicParity
Runs only deterministic parity behavior without adaptive governance.
LlamaOneTokenFinishReasonUAIX.LmRuntime.Models.Llama
1 member
Identifies why a bounded one-token generation operation ended.
OneTokenCompleted
Exactly one greedy token was selected as requested.
LlamaMappedModelLoadOptionsUAIX.LmRuntime.Models.Llama
5 members
Configures loading of a mapped LLaMA GGUF artifact.
ParseOptions
Gets GGUF parser safety limits.
BindingOptions
Gets semantic tensor binding validation options.
RuntimeMode
Gets the runtime mode.
MaximumReferenceMaterializationBytes
Gets the maximum bytes that scalar reference sessions may copy from mapped F32 weights.
ComputeModelSha256
Gets whether a SHA-256 digest of the complete artifact should be computed during load.
LlamaMappedModelLoadTimingsUAIX.LmRuntime.Models.Llama
5 members
Records measured stages of mapped model loading.
ParseDuration
Gets metadata and tensor catalog parse duration.
MapDuration
Gets operating-system memory-map creation duration.
CompositionDuration
Gets architecture, tokenizer, and binding composition duration.
HashDuration
Gets optional complete-file digest duration.
TotalDuration
Gets total load duration.
LlamaMappedModelManifestUAIX.LmRuntime.Models.Llama
13 members
Describes the immutable evidence produced while loading a mapped LLaMA model.
ModelPath
Gets the normalized model path.
ModelByteCount
Gets the exact mapped GGUF file length observed during parsing.
ModelSha256
Gets the optional complete-file SHA-256 digest.
GgufVersion
Gets the GGUF version.
Architecture
Gets the architecture identifier.
ModelName
Gets the model display name.
Tokenizer
Gets the tokenizer implementation name.
BoundTensorCount
Gets the bound tensor count.
StorageSummary
Gets the physical tensor storage summary used by direct mapped execution.
ManagedModelWeightCopiedByteCount
Gets the managed model-weight byte count copied by the default execution path.
RuntimeMode
Gets the selected execution mode.
Timings
Gets load-stage timings.
Evidence
Gets load evidence messages.
LlamaOneTokenOptionsUAIX.LmRuntime.Models.Llama
4 members
Configures one deterministic mapped-model greedy-token operation.
ResetSession
Gets whether the session should reset before prompt evaluation.
ParseSpecialTokens
Gets whether raw special-token text should be recognized.
AddSpecialTokens
Gets whether model-defined BOS/EOS behavior should be applied.
EmitTokenizerTrace
Gets whether tokenizer trace events should be captured.
LlamaOneTokenTimingsUAIX.LmRuntime.Models.Llama
4 members
Records measured stages of exactly one mapped-model greedy decode operation.
TokenizationDuration
Gets prompt tokenization duration.
PrefillDuration
Gets prompt prefill duration.
SelectionDuration
Gets greedy selection and token decode duration.
TotalDuration
Gets total operation duration.
LlamaMappedGreedyTokenResultUAIX.LmRuntime.Models.Llama
20 members
Represents an end-to-end prompt-to-one-token result from a mapped GGUF model.
ModelPath
Gets the normalized GGUF model path used for the operation.
ModelSha256
Gets the optional complete-file model digest computed during load.
ModelName
Gets the model display name declared by GGUF metadata.
Architecture
Gets the model architecture identifier.
Prompt
Gets the input prompt.
PromptTokenIds
Gets the exact prompt token identifiers.
TokenizerTrace
Gets tokenizer trace events when requested.
TokenId
Gets the selected token identifier.
TokenText
Gets the selected token text.
SelectedLogit
Gets the selected token logit.
Logits
Gets the complete next-token logits for parity diagnostics.
StorageSummary
Gets the mapped storage-type summary.
ManagedModelWeightCopiedByteCount
Gets the managed model-weight bytes copied by the session path.
ManagedAllocatedByteCount
Gets managed bytes allocated on the current thread during the measured operation.
Position
Gets the sequence position that produced the logits.
KeyValueCacheTokenCount
Gets the resulting key/value cache token count.
FinishReason
Gets the deterministic finish reason.
RuntimeMode
Gets the runtime mode.
Timings
Gets measured operation timings.
Evidence
Gets evidence statements for the deterministic one-token operation.
LlamaMappedModelLoaderUAIX.LmRuntime.Models.Llama
1 member
Loads a local GGUF artifact into a mapped, tokenizer-aware LLaMA model composition.
Load(string,UAIX.LmRuntime.Models.Llama.LlamaMappedModelLoadOptions)
Loads and validates one mapped local model.
pathoptionsReturns: The owned mapped model, with ownership and disposal obligations defined by the returned type and the Load contract.
LlamaMappedModelUAIX.LmRuntime.Models.Llama
14 members
Owns a mapped GGUF artifact and immutable LLaMA runtime composition.
Mapping
Gets the mapped model storage owner.
Configuration
Gets the validated LLaMA configuration.
TokenizerMetadata
Gets validated GGUF tokenizer metadata.
Tokenizer
Gets the exact metadata-driven tokenizer.
BindingManifest
Gets the tensor binding manifest.
Weights
Gets the mapped semantic weight set.
WeightSource
Gets the direct mapped execution weight source.
Options
Gets the load options retained for deterministic session creation.
Manifest
Gets the immutable load evidence manifest.
IsDisposed
Gets whether the model has been disposed.
CreateReferenceSession
Creates an independent scalar reference session with its own key/value state.
Returns: The new mapped reference session, with ownership and disposal obligations defined by the returned type and the CreateReferenceSession contract.
CreateMaterializedReferenceSession
Creates an independent compatibility session over explicitly materialized float32 arrays.
Returns: The materialized compatibility session, with ownership and disposal obligations defined by the returned type and the CreateMaterializedReferenceSession contract.
GetReferenceMaterialization
Gets the bounded reference materialization evidence, creating it on first use.
Returns: The LlamaReferenceWeightMaterialization result produced by LlamaMappedModel.GetReferenceMaterialization for this contract: Gets the bounded reference materialization evidence, creating it on first use. It is published only after all documented validation and ownership transitions succeed.
Dispose
Releases resources owned by LlamaMappedModel and transitions it to the disposed state.
LlamaMappedReferenceSessionUAIX.LmRuntime.Models.Llama
12 members
Combines exact GGUF tokenization with an independent scalar reference session.
Position
Gets the current next-token sequence position.
KvCache
Gets the typed session-local key/value cache.
IsDisposed
Gets whether this session has released its state.
Reset
Resets this session's sequence and key/value state.
DecodeOneGreedy(string,UAIX.LmRuntime.Models.Llama.LlamaOneTokenOptions)
Tokenizes a prompt, executes prefill, selects argmax, and decodes exactly one token.
promptoptionsReturns: The LlamaMappedGreedyTokenResult result produced by LlamaMappedReferenceSession.DecodeOneGreedy for this contract: Tokenizes a prompt, executes prefill, selects argmax, and decodes exactly one token. It is published only after all documented validation and ownership transitions succeed.
GenerateGreedy(string,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,UAIX.LmRuntime.Tokenization.TokenizationOptions,System.Threading.CancellationToken)
Tokenizes a prompt and generates greedy token identifiers into caller-owned buffers.
promptgeneratedTokenIdsfinalLogitsgenerationOptionstokenizationOptionscancellationTokenReturns: The LlamaGreedyGenerationResult result produced by LlamaMappedReferenceSession.GenerateGreedy for this contract: Tokenizes a prompt and generates greedy token identifiers into caller-owned buffers. It is published only after all documented validation and ownership transitions succeed.
GenerateGreedy(string,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,UAIX.LmRuntime.Tokenization.TokenizationOptions,System.Action<UAIX.LmRuntime.Models.Llama.LlamaGeneratedToken>,System.Threading.CancellationToken)
Tokenizes a prompt, generates greedy token identifiers, and reports each selected token synchronously.
promptgeneratedTokenIdsfinalLogitsgenerationOptionstokenizationOptionstokenObservercancellationTokenReturns: The LlamaGreedyGenerationResult result produced by LlamaMappedReferenceSession.GenerateGreedy for this contract: Tokenizes a prompt, generates greedy token identifiers, and reports each selected token synchronously. It is published only after all documented validation and ownership transitions succeed.
ExportState(UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)
Exports complete deterministic state with model, configuration, tokenizer, and cache-layout identities.
optionsReturns: A newly allocated byte[] containing the ordered result of LlamaMappedReferenceSession.ExportState: Exports complete deterministic state with model, configuration, tokenizer, and cache-layout identities. The caller owns the returned array and later mutation cannot alter the source object.
SaveState(string,UAIX.LmRuntime.Models.Llama.LlamaSessionPersistenceOptions)
Saves complete deterministic state to a local artifact.
pathoptionsReturns: The LlamaSessionArtifact result produced by LlamaMappedReferenceSession.SaveState for this contract: Saves complete deterministic state to a local artifact. It is published only after all documented validation and ownership transitions succeed.
RestoreState(System.ReadOnlySpan<byte>,int)
Restores verified complete state after enforcing mapped model and tokenizer identities.
bytesmaximumByteCountReturns: The LlamaSessionArtifact result produced by LlamaMappedReferenceSession.RestoreState for this contract: Restores verified complete state after enforcing mapped model and tokenizer identities. It is published only after all documented validation and ownership transitions succeed.
LoadState(string,int)
Loads and restores complete deterministic state from a local artifact.
pathmaximumByteCountReturns: The verified artifact, with ownership and disposal obligations defined by the returned type and the LoadState contract.
Dispose
Releases resources owned by LlamaMappedReferenceSession and transitions it to the disposed state.
LlamaModelConfigUAIX.LmRuntime.Models.Llama
17 members
Represents LLaMA-family transformer configuration reconstructed from GGUF metadata.
Architecture
Gets the architecture name.
ModelName
Gets the optional model display name.
EmbeddingLength
Gets the embedding length.
BlockCount
Gets the transformer block count.
FeedForwardLength
Gets the feed-forward hidden length.
AttentionHeadCount
Gets the attention head count.
AttentionKeyValueHeadCount
Gets the attention key/value head count.
ContextLength
Gets the training context length.
VocabularySize
Gets the vocabulary size.
RopeDimensionCount
Gets the RoPE dimension count per attention head.
RopeFrequencyBase
Gets the RoPE frequency base.
RmsNormEpsilon
Gets the RMSNorm epsilon.
SupportsTiedOutputProjection
Gets whether the loader may use token embeddings as the output projection when output.weight is absent.
HeadDimension
Gets the dimension of one query attention head.
KeyValueDimension
Gets the flattened key/value projection dimension.
FromGguf(UAIX.LmRuntime.Gguf.GgufModel)
Creates a LLaMA-family configuration from GGUF metadata.
modelReturns: The LlamaModelConfig result produced by LlamaModelConfig.FromGguf for this contract: Creates a LLaMA-family configuration from GGUF metadata. It is published only after all documented validation and ownership transitions succeed.
Validate
Validates architectural invariants required by the scalar LLaMA runtime.
LlamaReferenceForwardPassUAIX.LmRuntime.Models.Llama
2 members
Provides tiny reference building blocks for LLaMA-family correctness tests.
RmsNorm(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>,float)
Applies the LLaMA RMSNorm operation through the CPU reference kernel.
inputweightoutputepsilonApplyRope(System.Span<float>,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,int)
Applies LLaMA-style RoPE to a query or key vector in place.
vectorcossinropeDimensionsLlamaReferenceLayerWeightsUAIX.LmRuntime.Models.Llama
9 members
Stores immutable float32 weights for one scalar/reference LLaMA transformer block.
AttentionNorm
Gets the attention RMSNorm scale.
AttentionQuery
Gets the query projection matrix in row-major logical order.
AttentionKey
Gets the key projection matrix in row-major logical order.
AttentionValue
Gets the value projection matrix in row-major logical order.
AttentionOutput
Gets the attention output projection matrix in row-major logical order.
FeedForwardNorm
Gets the feed-forward RMSNorm scale.
FeedForwardGate
Gets the feed-forward gate projection matrix in row-major logical order.
FeedForwardUp
Gets the feed-forward up projection matrix in row-major logical order.
FeedForwardDown
Gets the feed-forward down projection matrix in row-major logical order.
LlamaReferenceModelWeightsUAIX.LmRuntime.Models.Llama
5 members
Stores immutable float32 weights for the deterministic LLaMA reference runtime.
TokenEmbeddings
Gets the token embedding table in row-major logical order.
Layers
Gets transformer block weights in execution order.
OutputNorm
Gets the final RMSNorm scale.
OutputProjection
Gets the output projection matrix in row-major logical order. An empty value means tied embeddings.
Validate(UAIX.LmRuntime.Models.Llama.LlamaModelConfig)
Validates all reference-weight shapes against a LLaMA configuration.
configLlamaGreedyTokenResultUAIX.LmRuntime.Models.Llama
5 members
Represents exactly one greedily selected token produced by the reference runtime.
TokenId
Gets the selected token identifier.
TokenText
Gets the selected token text when a tokenizer is attached.
PromptTokenCount
Gets the number of prompt tokens evaluated.
Position
Gets the zero-based position whose logits selected this token.
SelectedLogit
Gets the selected token logit.
LlamaReferenceSessionUAIX.LmRuntime.Models.Llama
17 members
Executes a deterministic, scalar-first LLaMA forward path for tiny correctness fixtures.
This class is the numerical correctness anchor for later optimized kernels. It is intentionally limited to batch size one and F32, Q8_0, or Q4_0 mapped or array-backed weights. It performs no governance or adaptive policy operations and therefore belongs exclusively to deterministic parity mode.
LlamaReferenceSession(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.LlamaReferenceModelWeights,UAIX.LmRuntime.Tokenization.IGgufTokenizer)
Initializes a reference session through the v1.8.0 array-backed compatibility path.
configweightstokenizerLlamaReferenceSession(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.ILlamaModelWeightSource,UAIX.LmRuntime.Tokenization.IGgufTokenizer)
Initializes a reference session over immutable array-backed or direct mapped weight sources.
configweightstokenizerPosition
Gets the next sequence position to be evaluated.
KvCache
Gets the typed key/value cache owned by this session.
WeightSource
Gets the immutable model weight source used by this session.
VocabularySize
Gets the configured vocabulary size.
ContextCapacity
Gets the configured sequence capacity.
Reset
Clears sequence state and all key/value cache contents.
CaptureState
Captures complete deterministic session state without serializing live model pointers.
Returns: The LlamaReferenceSessionSnapshot result produced by LlamaReferenceSession.CaptureState for this contract: Captures complete deterministic session state without serializing live model pointers. It is published only after all documented validation and ownership transitions succeed.
RestoreState(UAIX.LmRuntime.Models.Llama.LlamaReferenceSessionSnapshot)
Restores complete deterministic state after validating sequence, vocabulary, and cache identities.
snapshotRunStep(int,System.Span<float>)
Evaluates one input token and writes next-token logits.
tokenIdlogitsDecodeOneGreedy(System.Collections.Generic.IReadOnlyList<int>,bool)
Evaluates a prompt and returns exactly one greedily selected next token.
promptTokenIdsresetSessionReturns: The LlamaGreedyTokenResult result produced by LlamaReferenceSession.DecodeOneGreedy for this contract: Evaluates a prompt and returns exactly one greedily selected next token. It is published only after all documented validation and ownership transitions succeed.
Prefill(System.Collections.Generic.IReadOnlyList<int>,bool)
Evaluates every prompt token and leaves the final logits available for deterministic selection.
promptTokenIdsresetSessionCopyLastLogitsTo(System.Span<float>)
Copies the most recently computed logits to a caller-provided destination.
destinationGenerateGreedy(System.Collections.Generic.IReadOnlyList<int>,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,System.Threading.CancellationToken)
Generates deterministic greedy token identifiers into caller-owned buffers.
promptTokenIdsgeneratedTokenIdsfinalLogitsoptionscancellationTokenReturns: The LlamaGreedyGenerationResult result produced by LlamaReferenceSession.GenerateGreedy for this contract: Generates deterministic greedy token identifiers into caller-owned buffers. It is published only after all documented validation and ownership transitions succeed.
GenerateGreedy(System.Collections.Generic.IReadOnlyList<int>,System.Span<int>,System.Span<float>,UAIX.LmRuntime.Models.Llama.LlamaGreedyGenerationOptions,System.Action<UAIX.LmRuntime.Models.Llama.LlamaGeneratedToken>,System.Threading.CancellationToken)
Generates deterministic greedy token identifiers and reports each selection to a synchronous observer.
promptTokenIdsgeneratedTokenIdsfinalLogitsoptionstokenObservercancellationTokenReturns: The LlamaGreedyGenerationResult result produced by LlamaReferenceSession.GenerateGreedy for this contract: Generates deterministic greedy token identifiers and reports each selection to a synchronous observer. It is published only after all documented validation and ownership transitions succeed.
SelectGreedyToken(int)
Selects and decodes one greedy token from the current logits.
promptTokenCountReturns: The LlamaGreedyTokenResult result produced by LlamaReferenceSession.SelectGreedyToken for this contract: Selects and decodes one greedy token from the current logits. It is published only after all documented validation and ownership transitions succeed.
LlamaReferenceFixtureUAIX.LmRuntime.Models.Llama
5 members
Represents a deterministic tiny reference fixture with one transformer block.
Configuration
Gets the fixture model configuration.
Weights
Gets the fixture model weights.
Tokenizer
Gets the fixture tokenizer.
PromptTokenIds
Gets the canonical fixture prompt tokens.
CreateSession
Creates the session from the validated inputs required by LlamaReferenceFixture.
Returns: A session with empty key/value cache state.
LlamaReferenceFixtureFactoryUAIX.LmRuntime.Models.Llama
1 member
Creates deterministic tiny fixtures used by reference-runtime tests and examples.
CreateDeterministic
Creates a one-block, five-token deterministic LLaMA fixture.
Returns: The fixture configuration, weights, tokenizer, and prompt, with ownership and disposal obligations defined by the returned type and the CreateDeterministic contract.
ILlamaSessionUAIX.LmRuntime.Models.Llama
1 member
Defines the lifecycle for a LLaMA-family inference session.
DecodeAsync(int,System.Threading.CancellationToken)
Decodes the next token for the active sequence.
tokenIdcancellationTokenReturns: An asynchronous ValueTask<int> that completes with the result of ILlamaSession.DecodeAsync: Decodes the next token for the active sequence. Fault and cancellation states are propagated without a successful partial result.
LlamaReferenceExecutorUAIX.LmRuntime.Models.Llama
1 member
Provides scalar/reference execution anchors for LLaMA-family graphs.
Forward(System.ReadOnlySpan<float>,UAIX.LmRuntime.Models.Llama.LlamaWeights,System.Span<float>)
Executes a minimal reference forward pass over hidden-state logits.
hiddenStateweightslogitsLlamaWeightsUAIX.LmRuntime.Models.Llama
2 members
Represents model-level LLaMA weights used by reference execution.
TokenEmbeddings
Gets token embedding weights.
OutputProjection
Gets the output projection matrix in row-major order.
LlamaLayerWeightsUAIX.LmRuntime.Models.Llama
3 members
Represents one transformer block's reference weights.
AttentionQuery
Gets the attention query projection matrix.
AttentionKey
Gets the attention key projection matrix.
AttentionValue
Gets the attention value projection matrix.
LlamaReferenceRmsNormUAIX.LmRuntime.Models.Llama
1 member
Provides reference RMSNorm behavior.
Apply(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>,float)
Applies the supplied input to the supplied values while preserving the operation's numeric and shape invariants.
inputweightsoutputepsilonLlamaReferenceRopeUAIX.LmRuntime.Models.Llama
1 member
Provides reference RoPE behavior.
Apply(System.Span<float>,int,float)
Applies rotary position embedding to adjacent hidden-state pairs.
valuespositionthetaLlamaReferenceAttentionUAIX.LmRuntime.Models.Llama
1 member
Provides reference causal attention behavior.
ApplyCausal(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,int,System.Span<float>)
Applies a minimal causal attention score computation.
querykeysvaluesheadSizeoutputGroupedQueryAttentionMapUAIX.LmRuntime.Models.Llama
1 member
Maps query heads to grouped key/value heads.
MapHead(int,int,int)
Maps an attention query head to the corresponding KV head.
queryHeadqueryHeadCountkeyValueHeadCountReturns: The int value computed by GroupedQueryAttentionMap.MapHead for this contract: Maps an attention query head to the corresponding KV head. Range, finite-value, and overflow checks are completed before the value is returned.
LlamaSwiGluReferenceUAIX.LmRuntime.Models.Llama
1 member
Provides reference SwiGLU behavior.
Apply(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>)
Applies the SwiGLU activation to validated gate and up-projection vectors.
gateupoutputLlamaLogitComputerUAIX.LmRuntime.Models.Llama
1 member
Computes reference logits from a hidden state and output projection.
ComputeLogits(System.ReadOnlySpan<float>,System.ReadOnlySpan<float>,System.Span<float>)
Computes logits from a hidden vector and a row-major projection matrix.
hiddenStateprojectionlogitsLlamaParityToleranceUAIX.LmRuntime.Models.Llama
3 members
Configures exact token and explicit floating-point tolerance checks for cross-storage parity.
AbsoluteTolerance
Gets the absolute per-logit tolerance.
RelativeTolerance
Gets the relative per-logit tolerance.
Validate
Validates the absolute and relative parity tolerances used for numerical comparison.
LlamaLogitComparisonUAIX.LmRuntime.Models.Llama
6 members
Summarizes a deterministic comparison of two next-token logit vectors.
IsWithinTolerance
Gets whether every compared logit satisfies the configured tolerance.
MaximumAbsoluteError
Gets the largest absolute logit difference.
MeanAbsoluteError
Gets the arithmetic mean absolute logit difference.
FirstFailingIndex
Gets the first failing logit index, or when none failed.
FirstFailingReferenceValue
Gets the reference value at the first failing index.
FirstFailingCandidateValue
Gets the candidate value at the first failing index.
LlamaLogitComparatorUAIX.LmRuntime.Models.Llama
1 member
Compares deterministic next-token vectors without widening caller-provided tolerances.
Compare(System.Collections.Generic.IReadOnlyList<float>,System.Collections.Generic.IReadOnlyList<float>,UAIX.LmRuntime.Models.Llama.LlamaParityTolerance)
Compares two logit vectors using absolute-or-relative error acceptance.
referencecandidatetoleranceReturns: The LlamaLogitComparison result produced by LlamaLogitComparator.Compare for this contract: Compares two logit vectors using absolute-or-relative error acceptance. It is published only after all documented validation and ownership transitions succeed.
LlamaStorageParityCandidateResultUAIX.LmRuntime.Models.Llama
7 members
Represents one candidate model's parity result against a selected reference model.
ModelPath
Gets the candidate model path.
ModelSha256
Gets the candidate model SHA-256.
StorageSummary
Gets the candidate storage summary.
TokenMatches
Gets whether the selected token identifier exactly equals the reference identifier.
LogitComparison
Gets the detailed logit comparison.
OneTokenResult
Gets the complete candidate one-token result.
Passed
Gets whether both exact-token and floating-point contracts passed.
LlamaStorageParityResultUAIX.LmRuntime.Models.Llama
4 members
Represents a cross-storage one-token parity run.
Prompt
Gets the prompt used for every model.
ReferenceResult
Gets the reference one-token result.
Candidates
Gets candidate results in caller order.
Passed
Gets whether every candidate passed the explicit parity contract.
LlamaStorageParityRunnerUAIX.LmRuntime.Models.Llama
1 member
Executes bounded offline one-token parity comparisons across local GGUF storage variants.
Run(string,System.Collections.Generic.IReadOnlyList<string>,string,UAIX.LmRuntime.Models.Llama.LlamaParityTolerance)
Runs one reference model and one or more candidate models with identical prompt settings.
referenceModelPathcandidateModelPathsprompttoleranceReturns: The LlamaStorageParityResult result produced by LlamaStorageParityRunner.Run for this contract: Runs one reference model and one or more candidate models with identical prompt settings. It is published only after all documented validation and ownership transitions succeed.
LlamaTensorRoleUAIX.LmRuntime.Models.Llama
12 members
Identifies semantic roles for LLaMA-family tensors.
TokenEmbedding
Token embedding table.
OutputNorm
Final output normalization scale.
Output
Output projection matrix.
AttentionNorm
Per-block attention normalization scale.
AttentionQuery
Per-block query projection.
AttentionKey
Per-block key projection.
AttentionValue
Per-block value projection.
AttentionOutput
Per-block attention output projection.
FeedForwardNorm
Per-block feed-forward normalization scale.
FeedForwardGate
Per-block feed-forward gate projection.
FeedForwardUp
Per-block feed-forward up projection.
FeedForwardDown
Per-block feed-forward down projection.
TensorBindingStorageKindUAIX.LmRuntime.Models.Llama
2 members
Identifies where a bound tensor payload is stored.
MemoryMappedFile
The tensor remains in the GGUF memory-mapped artifact.
Alias
The tensor is an alias of another bound tensor.
TensorBindingOwnershipUAIX.LmRuntime.Models.Llama
2 members
Identifies ownership for a bound tensor payload.
BorrowedModelStorage
The binding borrows storage owned by the loaded model.
BorrowedAlias
The binding borrows storage through another tensor binding.
TensorBindingOptionsUAIX.LmRuntime.Models.Llama
4 members
Configures semantic validation performed by .
AllowTiedOutputProjection
Gets whether a missing output.weight may alias token_embd.weight.
ValidateSemanticShapes
Gets whether dimensions derived from model metadata must match the GGUF storage shape.
ValidateByteLengths
Gets whether physical byte lengths must match the registered tensor type traits.
ValidateFileBounds
Gets whether tensor ranges must fit inside the parsed source file length when available.
LlamaTensorRequirementUAIX.LmRuntime.Models.Llama
7 members
Describes one required LLaMA tensor contract.
Name
Gets the required tensor name.
Role
Gets the tensor role.
ExpectedRank
Gets the expected rank.
ExpectedStorageDimensions
Gets dimensions in GGUF storage order, where dimension zero is the row width.
ExpectedLogicalDimensions
Gets dimensions in logical row-major order for diagnostics and manifests.
BlockIndex
Gets the optional block index.
IsOptional
Gets whether the tensor may be satisfied by an explicit alias rule.
TensorBindingEntryUAIX.LmRuntime.Models.Llama
10 members
Represents one bound tensor entry.
Requirement
Gets the tensor requirement.
Descriptor
Gets the GGUF tensor descriptor supplying storage.
SourceTensorName
Gets the source tensor name when this binding is an alias.
LogicalDimensions
Gets the normalized logical dimensions.
ByteLength
Gets the physical storage byte length.
AbsoluteOffset
Gets the absolute source-file offset.
DataType
Gets the mapped runtime data type.
StorageKind
Gets the storage kind.
Ownership
Gets the ownership contract.
IsAlias
Gets whether this binding aliases another tensor.
TensorBindingDiagnosticUAIX.LmRuntime.Models.Llama
4 members
Represents a tensor binding diagnostic.
Code
Gets the diagnostic code.
TensorName
Gets the tensor name associated with the diagnostic.
BlockIndex
Gets the optional transformer block index.
Message
Gets the diagnostic message.
TensorBindingManifestUAIX.LmRuntime.Models.Llama
5 members
Represents the result of LLaMA tensor binding.
Bindings
Gets bound tensor entries.
Diagnostics
Gets binding diagnostics.
IsComplete
Gets a value indicating whether every required tensor was bound without diagnostics.
TryGetBinding(UAIX.LmRuntime.Models.Llama.LlamaTensorRole,System.Nullable<int>,UAIX.LmRuntime.Models.Llama.TensorBindingEntry&)
Attempts to find one bound tensor by semantic role and optional block index.
roleblockIndexentryReturns: True when try get binding succeeds for the supplied values; otherwise, false.
ThrowIfIncomplete
Throws when the manifest contains one or more diagnostics.
TensorBindingExceptionUAIX.LmRuntime.Models.Llama
2 members
Represents a failed LLaMA tensor schema binding operation.
TensorBindingException(UAIX.LmRuntime.Models.Llama.TensorBindingManifest)
Initializes a binding exception from a failed manifest.
manifestManifest
Gets the failed binding manifest.
LlamaRequiredTensorRegistryUAIX.LmRuntime.Models.Llama
1 member
Builds the required LLaMA-family tensor registry from model configuration.
Build(UAIX.LmRuntime.Models.Llama.LlamaModelConfig)
Creates the required tensor list for the configuration.
configReturns: An ordered read-only IReadOnlyList<LlamaTensorRequirement> result from LlamaRequiredTensorRegistry.Build: Creates the required tensor list for the configuration. Mutable internal collection aliases are not exposed through the returned contract.
LlamaTensorBinderUAIX.LmRuntime.Models.Llama
2 members
Binds and validates LLaMA-family GGUF tensors as a schema-validation phase.
Bind(UAIX.LmRuntime.Gguf.GgufModel,UAIX.LmRuntime.Models.Llama.LlamaModelConfig)
Binds required tensors from a parsed GGUF artifact using default validation options.
modelconfigReturns: The TensorBindingManifest result produced by LlamaTensorBinder.Bind for this contract: Binds required tensors from a parsed GGUF artifact using default validation options. It is published only after all documented validation and ownership transitions succeed.
Bind(UAIX.LmRuntime.Gguf.GgufModel,UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.TensorBindingOptions)
Binds required tensors from a parsed GGUF artifact.
modelconfigoptionsReturns: The TensorBindingManifest result produced by LlamaTensorBinder.Bind for this contract: Binds required tensors from a parsed GGUF artifact. It is published only after all documented validation and ownership transitions succeed.
MappedFloat16VectorSourceUAIX.LmRuntime.Models.Llama
6 members
Reads an IEEE float16 vector directly from a mapped GGUF tensor view.
MappedFloat16VectorSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedFloat16VectorSource instance with validated dependencies and operational bounds.
viewLength
DataType
StorageType
StorageDiagnostics
CopyTo(System.Span<float>)
Copies the to into caller-owned storage after validating the requested range and capacity.
destinationMappedBFloat16VectorSourceUAIX.LmRuntime.Models.Llama
6 members
Reads a brain-float16 vector directly from a mapped GGUF tensor view.
MappedBFloat16VectorSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedBFloat16VectorSource instance with validated dependencies and operational bounds.
viewLength
DataType
StorageType
StorageDiagnostics
CopyTo(System.Span<float>)
Copies the to into caller-owned storage after validating the requested range and capacity.
destinationMappedFloat16MatrixSourceUAIX.LmRuntime.Models.Llama
8 members
Reads and multiplies an IEEE float16 matrix directly from a mapped GGUF tensor view.
MappedFloat16MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedFloat16MatrixSource instance with validated dependencies and operational bounds.
viewRowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputMappedBFloat16MatrixSourceUAIX.LmRuntime.Models.Llama
8 members
Reads and multiplies a brain-float16 matrix directly from a mapped GGUF tensor view.
MappedBFloat16MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedBFloat16MatrixSource instance with validated dependencies and operational bounds.
viewRowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputMappedQ4_KMatrixSourceUAIX.LmRuntime.Models.Llama
8 members
Reads and multiplies a Q4_K matrix directly from a mapped GGUF tensor view.
MappedQ4_KMatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedQ4_KMatrixSource instance with validated dependencies and operational bounds.
viewRowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputMappedQ6_KMatrixSourceUAIX.LmRuntime.Models.Llama
8 members
Reads and multiplies a Q6_K matrix directly from a mapped GGUF tensor view.
MappedQ6_KMatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedQ6_KMatrixSource instance with validated dependencies and operational bounds.
viewRowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputMappedVectorSourceFactoryUAIX.LmRuntime.Models.Llama
1 member
Selects a mapped scalar vector implementation from GGML storage metadata.
Create(UAIX.LmRuntime.Gguf.MappedTensorView)
Creates the read only vector source from the validated inputs required by MappedVectorSourceFactory.
viewReturns: The storage-specific vector source, with ownership and disposal obligations defined by the returned type and the Create contract.
LlamaGenerationStopReasonUAIX.LmRuntime.Models.Llama
5 members
Identifies why deterministic greedy generation stopped.
MaximumTokens
The requested maximum number of tokens was produced.
EndOfSequence
The configured end-of-sequence token was selected.
StopToken
A caller-configured stop token was selected.
ContextCapacity
The model context window could not accept another evaluated token.
Cancelled
Cooperative cancellation was observed between committed inference steps.
LlamaGreedyGenerationOptionsUAIX.LmRuntime.Models.Llama
4 members
Defines allocation-bounded deterministic greedy generation controls.
MaximumTokens
Gets the maximum number of generated tokens.
ResetSession
Gets whether the session is reset before prompt prefill.
EndOfSequenceTokenId
Gets the optional end-of-sequence token identifier.
StopTokenIds
Gets additional token identifiers that terminate generation after being emitted.
LlamaGeneratedTokenUAIX.LmRuntime.Models.Llama
4 members
Describes one token selected during deterministic greedy generation.
The value contains only a zero-based sequence number, token identifier, and selected logit. It does not contain prompt text, decoded output, model bytes, file paths, persistent state, or provider information.
LlamaGeneratedToken(int,int,float)
Initializes a new LlamaGeneratedToken instance with validated dependencies and operational bounds.
sequencetokenIdselectedLogitSequence
Gets the zero-based token-selection sequence.
TokenId
Gets the selected model vocabulary identifier.
SelectedLogit
Gets the selected token's deterministic argmax logit.
LlamaGreedyGenerationResultUAIX.LmRuntime.Models.Llama
5 members
Describes an allocation-bounded greedy generation operation.
PromptTokenCount
Gets the number of prompt tokens evaluated for this operation.
GeneratedTokenCount
Gets the number of generated token identifiers written to the caller buffer.
StopReason
Gets the deterministic stop reason.
Position
Gets the next sequence position maintained by the session.
FinalSelectedLogit
Gets the selected logit of the final generated token, or negative infinity when none was generated.
RealModelSmokeStageUAIX.LmRuntime.Models.Llama
4 members
Identifies the deepest stage requested from the local real-model smoke workflow.
ParseOnly
Parses and validates the GGUF container only.
Tokenizer
Also constructs and validates the metadata-driven tokenizer.
TensorBinding
Also reconstructs LLaMA geometry and validates required tensor bindings.
OneToken
Also executes one deterministic greedy token when every storage contract is supported.
RealModelSmokeOptionsUAIX.LmRuntime.Models.Llama
15 members
Configures an explicitly local, opt-in GGUF smoke inspection.
ModelPath
Gets the local GGUF path.
AllowedRoot
Gets an optional root that the resolved model path must remain under.
MaximumFileByteCount
Gets an optional explicit maximum file length; zero disables this limit.
ComputeModelSha256
Gets whether the complete model SHA-256 should be computed.
Stage
Gets the deepest smoke stage to execute.
Prompt
Gets the prompt used by the one-token stage.
ExpectedTokenIdsPath
Gets an optional local JSON file containing expected prompt token identifiers.
ExpectedOneTokenPath
Gets an optional local JSON file containing the expected one-token result.
RequireEnvironmentGate
Gets whether the explicit environment gate is required.
PackageVersion
Gets the package version recorded in evidence.
CommitIdentity
Gets a commit or source identity supplied by the operator.
ProvenanceLabel
Gets an operator-supplied provenance label.
LicenseReviewStatus
Gets the operator-supplied license review status.
RedactModelPath
Gets whether the artifact model path is reduced to its file name.
EnvironmentGateName
Gets the environment variable that enables real-model execution.
RealModelSmokeStageEvidenceUAIX.LmRuntime.Models.Llama
3 members
Records one real-model workflow stage duration and current-thread allocation delta.
Stage
Gets the stage name.
ElapsedStopwatchTicks
Gets elapsed stopwatch ticks.
ManagedAllocatedByteCount
Gets managed bytes allocated on the measuring thread.
RealModelSmokeArtifactUAIX.LmRuntime.Models.Llama
29 members
Represents a versioned, machine-readable real-model smoke artifact.
Schema
Gets the artifact schema identifier.
PackageVersion
Gets the package version.
CommitIdentity
Gets the source/commit identity.
ProvenanceLabel
Gets the operator-supplied provenance label.
LicenseReviewStatus
Gets the operator-supplied license review status.
GeneratedUtc
Gets the generation time in UTC.
ClaimStatus
Gets the evidence claim status.
Succeeded
Gets whether the requested stage completed.
CompletedStage
Gets the deepest completed stage.
ModelPath
Gets the normalized local model path.
FileByteCount
Gets the model file length.
ModelSha256
Gets the optional complete-file SHA-256.
GgufVersion
Gets the parsed GGUF version.
Architecture
Gets the model architecture.
TokenizerFamily
Gets the tokenizer family.
StorageTypeCounts
Gets physical tensor counts by GGML storage name.
BindingDiagnostics
Gets binding diagnostic messages.
PromptTokenIds
Gets exact prompt token identifiers when tokenization completed.
SelectedTokenId
Gets the selected one-token identifier when execution completed.
SelectedTokenText
Gets the selected token text when execution completed.
ExpectedTokenIdsMatched
Gets whether the optional expected token-identifier evidence matched.
ExpectedOneTokenMatched
Gets whether the optional expected one-token evidence matched.
Alignment
Gets the effective GGUF tensor alignment.
PromptSha256
Gets the SHA-256 of the prompt text rather than requiring publication of the raw prompt.
StageEvidence
Gets stage timing and current-thread allocation measurements.
UnsupportedDiagnostics
Gets exact unsupported execution diagnostics.
CommandIdentity
Gets the non-secret command identity.
EnvironmentVariableNames
Gets environment-variable names used by the workflow without values.
Diagnostics
Gets bounded workflow diagnostics.
RealModelSmokeEnvironmentUAIX.LmRuntime.Models.Llama
1 member
Creates explicit local smoke options from the documented environment-variable contract.
Load(UAIX.LmRuntime.Models.Llama.RealModelSmokeStage)
Reads the local real-model smoke configuration from environment variables.
stageReturns: The local smoke options, with ownership and disposal obligations defined by the returned type and the Load contract.
RealModelPathPolicyUAIX.LmRuntime.Models.Llama
1 member
Resolves local model paths under an optional root without following hidden network or download behavior.
Resolve(string,string,long)
Resolves and validates one local model path.
pathallowedRootmaximumFileByteCountReturns: The text produced by RealModelPathPolicy.Resolve for this contract: Resolves and validates one local model path. The returned string is detached from mutable caller storage and is not persisted by the operation.
RealModelSmokeRunnerUAIX.LmRuntime.Models.Llama
1 member
Executes staged, offline real-model validation and emits a bounded evidence artifact.
Run(UAIX.LmRuntime.Models.Llama.RealModelSmokeOptions)
Runs the requested local smoke stages in their required order.
optionsReturns: A bounded machine-readable artifact describing the deepest completed stage.
ReferenceKvWriteBehaviorUAIX.LmRuntime.Models.Llama
1 member
Identifies the deterministic write semantics used by the scalar reference key/value cache.
AppendOrOverwrite
Writes append new positions and deterministically overwrite already written positions.
ReferenceKvCacheFingerprintUAIX.LmRuntime.Models.Llama
1 member
Computes stable fingerprints for model configurations that own reference key/value cache snapshots.
Create(UAIX.LmRuntime.Models.Llama.LlamaModelConfig)
Creates a SHA-256 fingerprint from the configuration fields that determine cache geometry and semantics.
configReturns: The text produced by ReferenceKvCacheFingerprint.Create for this contract: Creates a SHA-256 fingerprint from the configuration fields that determine cache geometry and semantics. The returned string is detached from mutable caller storage and is not persisted by the operation.
IReferenceKvCacheUAIX.LmRuntime.Models.Llama
13 members
Defines a typed, deterministic key/value cache contract for the scalar LLaMA reference runtime.
LayerCount
Gets the number of transformer layers.
ContextLength
Gets the maximum sequence capacity.
KeyValueHeadCount
Gets the number of key/value heads per layer.
HeadWidth
Gets the float width of one key/value head.
UsedTokenCount
Gets the highest contiguous token position written plus one.
ConfigurationFingerprint
Gets the configuration fingerprint required by compatible snapshots.
WriteBehavior
Gets the deterministic append-versus-overwrite behavior.
Write(int,int,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>)
Appends or replaces one layer's key and value vectors at a sequence position.
layerIndexpositionkeyvalueGetKey(int,int,int)
Retrieves the key from the current cache state after validating the requested access.
layerIndexpositionheadIndexReturns: The bounded ReadOnlySpan<float> view produced by IReferenceKvCache.GetKey: Retrieves the key from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.
GetValue(int,int,int)
Retrieves the value from the current cache state after validating the requested access.
layerIndexpositionheadIndexReturns: The bounded ReadOnlySpan<float> view produced by IReferenceKvCache.GetValue: Retrieves the value from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.
Reset
Resets the requested state to its validated initial state without publishing partial state.
CreateSnapshot
Creates a bounded snapshot for tiny-fixture testing and replay.
Returns: The immutable cache snapshot, with ownership and disposal obligations defined by the returned type and the CreateSnapshot contract.
Restore(UAIX.LmRuntime.Models.Llama.ReferenceKvCacheSnapshot)
Restores the supplied snapshot from a validated persisted representation.
snapshotReferenceKvCacheSnapshotUAIX.LmRuntime.Models.Llama
9 members
Represents an immutable snapshot of a tiny reference key/value cache.
SchemaVersion
Gets the snapshot schema version.
ConfigurationFingerprint
Gets the model/configuration fingerprint.
LayerCount
Gets the number of layers in the snapshot.
ContextLength
Gets the context capacity in the snapshot.
KeyValueHeadCount
Gets the key/value head count.
HeadWidth
Gets the per-head width.
UsedTokenCount
Gets the used token count.
Keys
Gets a copy of all key values.
Values
Gets a copy of all value values.
ReferenceKvCacheDiagnosticSnapshotUAIX.LmRuntime.Models.Llama
3 members
Represents a bounded, non-mutable diagnostic view of reference cache state.
ConfigurationFingerprint
Gets the configuration fingerprint.
UsedTokenCount
Gets the used token count.
ContentSha256
Gets the SHA-256 of the used key/value prefix.
ReferenceKvCacheUAIX.LmRuntime.Models.Llama
16 members
Stores reference key/value state in two contiguous arrays without per-token dictionaries.
ReferenceKvCache(int,int,int,int)
Initializes a reference key/value cache with a geometry-derived compatibility fingerprint.
layerCountcontextLengthkeyValueHeadCountheadWidthReferenceKvCache(int,int,int,int,string)
Initializes a reference key/value cache with an explicit model/configuration fingerprint.
layerCountcontextLengthkeyValueHeadCountheadWidthconfigurationFingerprintLayerCount
ContextLength
KeyValueHeadCount
HeadWidth
UsedTokenCount
ConfigurationFingerprint
WriteBehavior
Write(int,int,System.ReadOnlySpan<float>,System.ReadOnlySpan<float>)
Writes the supplied layer index to the current cache state using the component's canonical representation.
layerIndexpositionkeyvalueGetKey(int,int,int)
Retrieves the key from the current cache state after validating the requested access.
layerIndexpositionheadIndexReturns: The bounded ReadOnlySpan<float> view produced by ReferenceKvCache.GetKey: Retrieves the key from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.
GetValue(int,int,int)
Retrieves the value from the current cache state after validating the requested access.
layerIndexpositionheadIndexReturns: The bounded ReadOnlySpan<float> view produced by ReferenceKvCache.GetValue: Retrieves the value from the current cache state after validating the requested access. Its lifetime and ownership remain tied to the owner identified by the containing type; no out-of-range region is exposed.
Reset
Resets the reference KV cache contents and logical sequence position to their initial state.
CreateSnapshot
Creates the snapshot from the validated inputs required by ReferenceKvCache.
Returns: The ReferenceKvCacheSnapshot result produced by ReferenceKvCache.CreateSnapshot for this contract: Creates the snapshot from the validated inputs required by ReferenceKvCache. It is published only after all documented validation and ownership transitions succeed.
CreateDiagnosticSnapshot
Creates a small diagnostic snapshot without exposing mutable key/value arrays.
Returns: The bounded diagnostic snapshot, with ownership and disposal obligations defined by the returned type and the CreateDiagnosticSnapshot contract.
Restore(UAIX.LmRuntime.Models.Llama.ReferenceKvCacheSnapshot)
Restores the supplied snapshot from a validated persisted representation.
snapshotReferenceKvPortableSnapshotUAIX.LmRuntime.Models.Llama
6 members
Carries a deterministic portable key/value-cache snapshot and its compatibility identities.
SchemaVersion
Gets the portable schema version.
ConfigurationFingerprint
Gets the model-configuration fingerprint.
ModelArtifactFingerprint
Gets the optional model-artifact fingerprint.
CacheLayoutFingerprint
Gets the cache-layout fingerprint.
ContentSha256
Gets the SHA-256 of the serialized bytes preceding the digest field.
Snapshot
Gets the restored capacity-shaped snapshot.
ReferenceKvCacheSerializerUAIX.LmRuntime.Models.Llama
5 members
Serializes only logically used key/value positions in stable layer-position-head order.
Schema version two is additive and does not change the in-memory version-one snapshot contract retained for source compatibility. Unused capacity is reconstructed as zero during deserialization.
SchemaVersion
Gets the portable snapshot schema version.
DefaultMaximumByteCount
Gets the default maximum serialized snapshot size.
Serialize(UAIX.LmRuntime.Models.Llama.ReferenceKvCacheSnapshot,string,string,int)
Serializes a bounded cache snapshot in deterministic little-endian form.
snapshotmodelArtifactFingerprintcacheLayoutFingerprintmaximumByteCountReturns: The serialized snapshot bytes including a trailing SHA-256.
Deserialize(System.ReadOnlySpan<byte>,int)
Deserializes and verifies a portable key/value-cache snapshot.
bytesmaximumByteCountReturns: The ReferenceKvPortableSnapshot result produced by ReferenceKvCacheSerializer.Deserialize for this contract: Deserializes and verifies a portable key/value-cache snapshot. It is published only after all documented validation and ownership transitions succeed.
Restore(UAIX.LmRuntime.Models.Llama.ReferenceKvCache,System.ReadOnlySpan<byte>,string,string)
Restores verified portable bytes into a cache after validating model and layout identities.
cachebytesexpectedModelArtifactFingerprintexpectedCacheLayoutFingerprintWeightSourceStorageDiagnosticsUAIX.LmRuntime.Models.Llama
7 members
Describes immutable storage used by one deterministic reference weight source.
TensorName
Gets the semantic tensor name.
StorageType
Gets the GGML physical storage type.
DataType
Gets the logical runtime data type.
ByteLength
Gets the physical byte length.
ManagedCopiedByteCount
Gets the number of bytes copied into persistent managed model-weight storage.
IsMemoryMapped
Gets a value indicating whether the source borrows memory-mapped storage.
IsAlias
Gets a value indicating whether this source aliases another semantic binding.
IReadOnlyVectorSourceUAIX.LmRuntime.Models.Llama
5 members
Exposes an immutable logical vector without requiring a particular storage representation.
Length
Gets the logical vector length.
DataType
Gets the logical runtime data type.
StorageType
Gets the physical GGML storage type.
StorageDiagnostics
Gets immutable storage diagnostics.
CopyTo(System.Span<float>)
Copies every vector value into a caller-owned float32 destination.
destinationIReadOnlyMatrixSourceUAIX.LmRuntime.Models.Llama
7 members
Exposes an immutable logical row-major matrix without requiring a particular storage representation.
RowCount
Gets the logical row count.
ColumnCount
Gets the logical column count.
DataType
Gets the logical runtime data type.
StorageType
Gets the physical GGML storage type.
StorageDiagnostics
Gets immutable storage diagnostics.
CopyRowTo(int,System.Span<float>)
Copies and, when required, dequantizes one logical row into a caller-owned float32 destination.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies this matrix by a float32 vector without materializing a complete float32 matrix.
vectoroutputILlamaLayerWeightSourceUAIX.LmRuntime.Models.Llama
9 members
Exposes immutable weights required by one LLaMA transformer block.
AttentionNorm
Gets the attention normalization vector.
AttentionQuery
Gets the query projection matrix.
AttentionKey
Gets the key projection matrix.
AttentionValue
Gets the value projection matrix.
AttentionOutput
Gets the attention output projection matrix.
FeedForwardNorm
Gets the feed-forward normalization vector.
FeedForwardGate
Gets the feed-forward gate projection matrix.
FeedForwardUp
Gets the feed-forward up projection matrix.
FeedForwardDown
Gets the feed-forward down projection matrix.
ILlamaModelWeightSourceUAIX.LmRuntime.Models.Llama
8 members
Exposes immutable model weights required by the deterministic LLaMA reference session.
TokenEmbeddings
Gets the token embedding table.
Layers
Gets transformer-block weights in execution order.
OutputNorm
Gets the final output normalization vector.
OutputProjection
Gets the output projection matrix.
UsesTiedOutputProjection
Gets a value indicating whether output projection aliases token embeddings.
StorageDiagnostics
Gets storage diagnostics for every distinct semantic source.
StorageSummary
Gets a stable summary of physical storage types used by the model.
ManagedCopiedByteCount
Gets persistent managed model-weight bytes represented by this source.
ArrayVectorSourceUAIX.LmRuntime.Models.Llama
7 members
Provides an immutable array-backed vector adapter for compatibility and deterministic fixtures.
ArrayVectorSource(string,float[])
Initializes a new ArrayVectorSource instance with validated dependencies and operational bounds.
tensorNamevaluesTensorName
Gets the semantic tensor name.
Length
DataType
StorageType
StorageDiagnostics
CopyTo(System.Span<float>)
Copies the to into caller-owned storage after validating the requested range and capacity.
destinationArrayMatrixSourceUAIX.LmRuntime.Models.Llama
9 members
Provides an immutable row-major array-backed matrix adapter for compatibility and deterministic fixtures.
ArrayMatrixSource(string,float[],int,int)
Initializes a new ArrayMatrixSource instance with validated dependencies and operational bounds.
tensorNamevaluesrowCountcolumnCountTensorName
Gets the semantic tensor name.
RowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputArrayLlamaLayerWeightSourceUAIX.LmRuntime.Models.Llama
10 members
Provides one array-backed LLaMA layer weight source.
ArrayLlamaLayerWeightSource(int,UAIX.LmRuntime.Models.Llama.LlamaReferenceLayerWeights,UAIX.LmRuntime.Models.Llama.LlamaModelConfig)
Initializes a new ArrayLlamaLayerWeightSource instance with validated dependencies and operational bounds.
blockIndexweightsconfigAttentionNorm
AttentionQuery
AttentionKey
AttentionValue
AttentionOutput
FeedForwardNorm
FeedForwardGate
FeedForwardUp
FeedForwardDown
ArrayLlamaModelWeightSourceUAIX.LmRuntime.Models.Llama
10 members
Adapts the v1.8.0 float-array model to the storage-neutral v1.9.0 execution contracts.
Create(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.LlamaReferenceModelWeights)
Creates an array-backed source after validating its complete model contract.
configweightsReturns: The array-backed source, with ownership and disposal obligations defined by the returned type and the Create contract.
ArrayLlamaModelWeightSource(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.LlamaReferenceModelWeights)
Initializes a new ArrayLlamaModelWeightSource instance with validated dependencies and operational bounds.
configweightsTokenEmbeddings
Layers
OutputNorm
OutputProjection
UsesTiedOutputProjection
StorageDiagnostics
ManagedCopiedByteCount
StorageSummary
MappedFloat32VectorSourceUAIX.LmRuntime.Models.Llama
6 members
Reads a float32 vector directly from a mapped GGUF tensor view.
MappedFloat32VectorSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedFloat32VectorSource instance with validated dependencies and operational bounds.
viewLength
DataType
StorageType
StorageDiagnostics
CopyTo(System.Span<float>)
Copies the to into caller-owned storage after validating the requested range and capacity.
destinationMappedFloat32MatrixSourceUAIX.LmRuntime.Models.Llama
8 members
Reads and multiplies an F32 matrix directly from a mapped GGUF tensor view.
MappedFloat32MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedFloat32MatrixSource instance with validated dependencies and operational bounds.
viewRowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputMappedQ8_0MatrixSourceUAIX.LmRuntime.Models.Llama
8 members
Reads and multiplies a Q8_0 matrix directly from a mapped GGUF tensor view.
MappedQ8_0MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedQ8_0MatrixSource instance with validated dependencies and operational bounds.
viewRowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputMappedQ4_0MatrixSourceUAIX.LmRuntime.Models.Llama
8 members
Reads and multiplies a Q4_0 matrix directly from a mapped GGUF tensor view.
MappedQ4_0MatrixSource(UAIX.LmRuntime.Gguf.MappedTensorView)
Initializes a new MappedQ4_0MatrixSource instance with validated dependencies and operational bounds.
viewRowCount
ColumnCount
DataType
StorageType
StorageDiagnostics
CopyRowTo(int,System.Span<float>)
Copies the row to into caller-owned storage after validating the requested range and capacity.
rowIndexdestinationMultiply(System.ReadOnlySpan<float>,System.Span<float>)
Multiplies the supplied vector by the supplied vector without changing logical row order.
vectoroutputMappedMatrixSourceFactoryUAIX.LmRuntime.Models.Llama
1 member
Creates supported matrix sources over mapped tensor views.
Create(UAIX.LmRuntime.Gguf.MappedTensorView)
Creates a direct mapped source for supported scalar and quantized storage.
viewReturns: The storage-specific matrix source, with ownership and disposal obligations defined by the returned type and the Create contract.
MappedLlamaLayerWeightSourceUAIX.LmRuntime.Models.Llama
10 members
Exposes one mapped LLaMA transformer block through storage-neutral execution contracts.
MappedLlamaLayerWeightSource(UAIX.LmRuntime.Models.Llama.LlamaBoundLayerWeightSet)
Initializes a new MappedLlamaLayerWeightSource instance with validated dependencies and operational bounds.
weightsAttentionNorm
AttentionQuery
AttentionKey
AttentionValue
AttentionOutput
FeedForwardNorm
FeedForwardGate
FeedForwardUp
FeedForwardDown
MappedLlamaModelWeightSourceUAIX.LmRuntime.Models.Llama
10 members
Exposes a complete mapped LLaMA model through storage-neutral execution contracts.
Create(UAIX.LmRuntime.Models.Llama.LlamaBoundWeightSet)
Creates and validates a complete mapped model weight source.
weightsReturns: The validated mapped model weight source, with ownership and disposal obligations defined by the returned type and the Create contract.
MappedLlamaModelWeightSource(UAIX.LmRuntime.Models.Llama.LlamaBoundWeightSet)
Initializes a new MappedLlamaModelWeightSource instance with validated dependencies and operational bounds.
weightsTokenEmbeddings
Layers
OutputNorm
OutputProjection
UsesTiedOutputProjection
StorageDiagnostics
ManagedCopiedByteCount
Gets the total number of persistent managed model-weight bytes copied by this source.
StorageSummary
LlamaWeightSourceValidatorUAIX.LmRuntime.Models.Llama
1 member
Validates storage-neutral LLaMA weight sources before deterministic execution begins.
Validate(UAIX.LmRuntime.Models.Llama.LlamaModelConfig,UAIX.LmRuntime.Models.Llama.ILlamaModelWeightSource)
Validates every global and block-local source against the configured model geometry.
configweightsA mapped session reads supported weight storage through mapped sources. A materialized session copies compatible weights into managed reference structures. The manifest and materialization records expose the chosen ownership and copied-byte behavior.
No. Configuration validation is one gate. Required tensors, storage support, tokenizer compatibility, binding, context limits, and a real execution stage must also pass.
They should not be. Use the model hash and configuration, tokenizer, and cache-layout fingerprints as strict compatibility checks, plus a bounded size and trusted path policy.
No. It is a legible correctness and parity anchor. Performance claims require retained measurements for the exact model, hardware, settings, and code path.