Question 1

What is Runyard?

Accepted Answer

Runyard is a free hardware-aware AI model browser. You enter your CPU, GPU, and VRAM and it instantly shows every local LLM that will run on your machine, ranked by speed and quality.

Question 2

How much VRAM do I need to run local LLMs?

Accepted Answer

8GB of VRAM runs 7B models like Llama 3.1 8B and Mistral 7B at Q4 quantization. 16GB unlocks 13B models. 24GB lets you run Mixtral 8x7B and Llama 3 70B at lower quantization.

Question 3

What is the best local LLM for my GPU?

Accepted Answer

Use Runyard at www.runyard.dev — enter your GPU and VRAM and the Model Radar will rank every compatible LLM for your exact hardware, showing estimated tokens per second for each model.

Question 4

Can I run Llama 3 locally?

Accepted Answer

Yes. Llama 3.1 8B at Q4 runs on any 8GB VRAM GPU. Llama 3.1 70B needs around 40GB VRAM at Q4, or an Apple Silicon Mac with 64GB+ unified memory.

Find AI models that actually run on your hardware.

What is Runyard?

Our mission

How we benchmark

Who builds this

Contact