Pure C# · Local inference · Explicit evidence
When your model needs somewhere to run.
LMRuntime is a local-first execution boundary for GGUF and LLaMA-family models: strict parsing, mapped tensors, semantic binding, deterministic reference execution, and optional acceleration without hiding the path your model takes.
- No native wrapper dependency
- No hidden network fallback
- Claims bounded by evidence
Runtime capabilities
A runtime, not a wrapper.
The model path stays visible from container bytes to decoded token. Each boundary is designed to fail clearly, preserve provenance, and distinguish what is implemented from what has actually been executed and measured.
01
Pure C# boundary
Keep the product runtime in managed code instead of delegating execution to llama.cpp or a wrapper layer.
02
Strict GGUF intake
Parse metadata, alignment, shapes, offsets, and tensor descriptors with explicit validation before execution begins.
03
Mapped tensor storage
Address model tensors through mapped storage boundaries so large weights do not require an opaque loading path.
04
Semantic LLaMA binding
Resolve architecture-specific tensor roles by meaning, not by assuming that every model uses one incidental naming layout.
05
Deterministic reference path
Use scalar reference execution as a correctness authority for parity work, fixtures, and backend comparisons.
06
Optional acceleration
Place CPU and CUDA backends behind explicit boundaries while preserving a readable, testable reference route.
Execution architecture
From GGUF bytes to a decoded token.
The reference route is intentionally legible. Parsing, storage, model binding, tokenization, execution, and decoding remain distinct boundaries that can be inspected and tested independently.
Parse
Validate GGUF structure and metadata.
Map
Expose bounded tensor storage.
Bind
Resolve architecture semantics.
Tokenize
Read tokenizer behavior from metadata.
Execute
Run the deterministic reference path.
Decode
Return an explicit token result.
using Uai.LlmRuntime.Models.Llama;
using var model = new LlamaMappedModelLoader().Load(
"model.gguf",
new LlamaMappedModelLoadOptions
{
ComputeModelSha256 = true
});
using var session = model.CreateReferenceSession();
var result = session.DecodeOneGreedy("Hello, runtime.");
Console.WriteLine(result.TokenText);
Explicit boundaries
Each layer has a narrow responsibility and an observable failure surface.
- GGUFContainer and metadata validation
- StorageMapped model bytes and tensor views
- BindingLLaMA-family semantic resolution
- TokenizerMetadata-driven prompt encoding
- ReferenceScalar correctness authority
- BackendsOptional CPU and CUDA boundaries
Evidence ledger
Claims should stop where the evidence stops.
The supplied source candidate records code, documentation, tests, fixtures, and blocked execution separately. These counts describe that package snapshot; they are not throughput or compatibility benchmarks.
SOURCE CANDIDATE · Uai.LlmRuntime 3.0.0
Engineering notes
Work in the open.
-
Hello world!
Welcome to WordPress. This is your first post. Edit or delete it, then start writing!
Give the model a runtime you can inspect.
Start at the managed model boundary, follow the one-token reference path, and attach acceleration only where the evidence supports it.
LMRuntime.com — When your model needs somewhere to run.