About Runyard

Find AI models that actually run on your hardware.

Runyard is a hardware-aware AI model discovery platform. Enter your GPU, RAM, and VRAM. We tell you which local LLMs will fit, how fast they will run, and which quantization to pick. Free. No signup. No telemetry.

What is Runyard?

Most developers find out their hardware cannot run a model after they have already downloaded 40 GB of weights and watched it OOM in their terminal. That is the problem Runyard exists to solve.

We index hundreds of open-source language models — Llama, Mistral, Qwen, Gemma, Phi, DeepSeek, Mixtral, Yi, and more — across every quantization level you can actually download (Q2 through FP16). For each model we know how much VRAM the weights take, how the KV cache grows with context, and how fast it tends to run on common GPUs and Apple Silicon. You give us your hardware; we give you a shortlist you can trust before you wait on a download.

Around that core we ship 15+ free tools: a VRAM calculator, a quantization picker, a tokens-per-second estimator, an OOM-fix assistant, a model comparison matrix, a GGUF variant chooser, and more. Each one answers a single concrete question that we have seen developers waste hours on.

Our mission

Local AI should be a choice anyone can make on their own hardware, without having to read three GitHub issues and a Reddit thread to know whether a model will load. Cloud inference is fine, but it is not the only path. Privacy, latency, cost at scale, and offline access all matter — and the hardware to run useful models locally has been within reach for over a year.

Our mission is narrow and concrete: make the local-AI decision graph legible. The right model, the right quant, the right runtime, on the hardware you already own.

How we benchmark

Every tokens-per-second number on Runyard comes from a documented methodology. We test fixed prompt and completion lengths, log resident VRAM (not reserved), and report the median of three runs on each hardware/quant combination. Where we cannot test directly, we mark estimates as such and explain the model we used to derive them.

The full breakdown — hardware list, inference engines, exact commands, what we exclude, and how to replicate — lives on our methodology page. If you find a number you cannot reproduce, please tell us. We update quarterly and publish a changelog.

Who builds this

Runyard is built by a small team that has been deploying language models both in the cloud and on commodity GPUs since the original Llama leak. Most of our writing is published under the Runyard Team byline; for individual posts and credentials, see the author bio linked from each article.

If you want to know more about who we are or get in touch about a specific benchmark, email us at hello@runyard.dev.

Contact