Pure C# · Local inference · Explicit evidence

When your model needs somewhere to run.

LMRuntime is a local-first execution boundary for GGUF and LLaMA-family models: strict parsing, mapped tensors, semantic binding, deterministic reference execution, and optional acceleration without hiding the path your model takes.

  • No native wrapper dependency
  • No hidden network fallback
  • Claims bounded by evidence

Runtime capabilities

A runtime, not a wrapper.

The model path stays visible from container bytes to decoded token. Each boundary is designed to fail clearly, preserve provenance, and distinguish what is implemented from what has actually been executed and measured.

01

Pure C# boundary

Keep the product runtime in managed code instead of delegating execution to llama.cpp or a wrapper layer.

02

Strict GGUF intake

Parse metadata, alignment, shapes, offsets, and tensor descriptors with explicit validation before execution begins.

03

Mapped tensor storage

Address model tensors through mapped storage boundaries so large weights do not require an opaque loading path.

04

Semantic LLaMA binding

Resolve architecture-specific tensor roles by meaning, not by assuming that every model uses one incidental naming layout.

05

Deterministic reference path

Use scalar reference execution as a correctness authority for parity work, fixtures, and backend comparisons.

06

Optional acceleration

Place CPU and CUDA backends behind explicit boundaries while preserving a readable, testable reference route.

Execution architecture

From GGUF bytes to a decoded token.

The reference route is intentionally legible. Parsing, storage, model binding, tokenization, execution, and decoding remain distinct boundaries that can be inspected and tested independently.

01

Parse

Validate GGUF structure and metadata.

02

Map

Expose bounded tensor storage.

03

Bind

Resolve architecture semantics.

04

Tokenize

Read tokenizer behavior from metadata.

05

Execute

Run the deterministic reference path.

06

Decode

Return an explicit token result.

Program.cs
using Uai.LlmRuntime.Models.Llama;

using var model = new LlamaMappedModelLoader().Load(
    "model.gguf",
    new LlamaMappedModelLoadOptions
    {
        ComputeModelSha256 = true
    });

using var session = model.CreateReferenceSession();
var result = session.DecodeOneGreedy("Hello, runtime.");
Console.WriteLine(result.TokenText);

Explicit boundaries

Each layer has a narrow responsibility and an observable failure surface.

  • GGUFContainer and metadata validation
  • StorageMapped model bytes and tensor views
  • BindingLLaMA-family semantic resolution
  • TokenizerMetadata-driven prompt encoding
  • ReferenceScalar correctness authority
  • BackendsOptional CPU and CUDA boundaries
Implemented does not mean executed. Executed does not mean measured. LMRuntime keeps those claims separate.

Evidence ledger

Claims should stop where the evidence stops.

The supplied source candidate records code, documentation, tests, fixtures, and blocked execution separately. These counts describe that package snapshot; they are not throughput or compatibility benchmarks.

SOURCE CANDIDATE · Uai.LlmRuntime 3.0.0

17source projects in the candidate manifest
2,958documented methods and constructors
1,791xUnit facts declared in source
8deterministic GGUF fixtures
ImplementedCode is present in the inspected candidate.
ExecutedA recorded command or process completed.
MeasuredA result was captured from a completed run.
PlannedDesign intent, not a current capability claim.
Execution disclosure: the package manifest records 13 managed command attempts as blocked before a dotnet process could start in the supplied verification environment. Blocked is not passed, and this theme does not present it as passed.

Engineering notes

Work in the open.

  • Hello world!

    Welcome to WordPress. This is your first post. Edit or delete it, then start writing!

    Read note

Give the model a runtime you can inspect.

Start at the managed model boundary, follow the one-token reference path, and attach acceleration only where the evidence supports it.

LMRuntime.com — When your model needs somewhere to run.