Runyard is a free hardware-aware AI model browser. You enter your CPU, GPU, and VRAM and it instantly shows every local LLM that will run on your machine, ranked by speed and quality.

How much VRAM do I need to run local LLMs?

8GB of VRAM runs 7B models like Llama 3.1 8B and Mistral 7B at Q4 quantization. 16GB unlocks 13B models. 24GB lets you run Mixtral 8x7B and Llama 3 70B at lower quantization.

What is the best local LLM for my GPU?

Use Runyard at www.runyard.dev — enter your GPU and VRAM and the Model Radar will rank every compatible LLM for your exact hardware, showing estimated tokens per second for each model.

Can I run Llama 3 locally?

Yes. Llama 3.1 8B at Q4 runs on any 8GB VRAM GPU. Llama 3.1 70B needs around 40GB VRAM at Q4, or an Apple Silicon Mac with 64GB+ unified memory.

← Blog/Turn Any PC Into a Home AI Server: The Complete 2026 Setup Guide

May 6, 2026guide

Runyard Team

@runyard_dev

14 min read

Contents

▸Why a Dedicated Server Beats Running Ollama on Your Workstation ▸What Hardware You Actually Need ▸The Three-Component Software Stack ▸Step 1: Install and Configure Ollama for Network Access ▸Step 2: Install Open WebUI with Docker ▸Step 3: Remote Access with Tailscale ▸Securing It: What NOT to Do ▸Production-Ready Docker Compose ▸Which Models to Preload ▸Integrating With Your Other Tools ▸The Real Cost Comparison

Turn Any PC Into a Home AI Server: The Complete 2026 Setup Guide

Home server rack with networking equipment for local AI inference — A dedicated AI server means your phone, laptop, and tablet all get instant private AI — with no cloud bills.

Most people run Ollama on their main workstation and treat local AI as a single-machine, single-user experience. That works fine until your spouse wants to use it too, or you want AI available on your phone without carrying a gaming PC. The better setup — one that more people are moving to in 2026 — is a dedicated home AI server: an always-on machine that runs Ollama and Open WebUI, accessible to every device on your network. This guide walks you through the full setup, from hardware to remote access.

Why a Dedicated Server Beats Running Ollama on Your Workstation

▸Always-on — models stay loaded in VRAM 24/7; first token arrives in under a second instead of waiting for a cold load
▸Every device gets AI — phone, tablet, second laptop, smart home systems, scripts — all hit the same local endpoint
▸Zero interference — the AI server runs independently; no dropped frames in games, no competition for GPU memory
▸Multi-user — Open WebUI gives each family member or team member their own login, conversations, and settings
▸Privacy by default — traffic never leaves your network; Tailscale encrypts remote access without opening a public port

What Hardware You Actually Need

Home AI Server Tiers — Approximate Build Cost

Mini PC (Intel N100, 16GB)

200 USD

Used RTX 3060 12GB Desktop

550 USD

Used RTX 3090 24GB Desktop

850 USD

RTX 4070 Ti 16GB Workstation

1400 USD

RTX 4090 24GB Workstation

2400 USD

▸$200 — Intel N100 / AMD mini PC with 16-32GB RAM. CPU inference only. Runs 7B models at 8-15 tok/s. Silent, uses ~15W idle. Great for light household use.
▸$550 — Used RTX 3060 (12GB) desktop. Runs all 7B-13B models at GPU speed (40-60 tok/s). Best entry point for real-time multi-device chat.
▸$850 — Used RTX 3090 (24GB). The sweet spot. Runs Llama 70B at Q4, all 13B models at Q8. Fast enough for multiple concurrent users.
▸$1,400 — RTX 4070 Ti (16GB). Faster than the 3090 on 7B-13B. Better for latency-sensitive coding inference.
▸$2,400 — RTX 4090 (24GB). No compromise. Runs any model that fits in 24GB at maximum speed.

The used RTX 3090 (24GB) at around $700-850 is the single best value for a home AI server in 2026. It runs Llama 70B at Q4, handles multiple concurrent users without degradation, and is plentiful on the used market.

The Three-Component Software Stack

1.Ollama — the inference engine. Manages model downloads, VRAM loading, and serves an OpenAI-compatible API on port 11434. Everything else talks to Ollama.
2.Open WebUI — the browser interface. A full-featured chat UI with conversation history, model switching, system prompts, multi-user accounts, and RAG document upload.
3.Tailscale (optional) — encrypted remote access. Lets your phone and laptop reach the home server from anywhere without opening any router ports.

Step 1: Install and Configure Ollama for Network Access

By default, Ollama only listens on localhost. One environment variable opens it to your local network:

terminalbash

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Enable as a systemd service (auto-starts on boot)
sudo systemctl enable ollama
sudo systemctl start ollama

# Open a drop-in override to listen on all network interfaces:
sudo systemctl edit ollama
# Add these lines in the editor:
# [Service]
# Environment="OLLAMA_HOST=0.0.0.0"
# Environment="OLLAMA_ORIGINS=*"
# Environment="OLLAMA_KEEP_ALIVE=24h"

# Reload and restart
sudo systemctl daemon-reload && sudo systemctl restart ollama

# Test from another machine on your LAN
curl http://YOUR-SERVER-IP:11434/api/tags

Step 2: Install Open WebUI with Docker

terminalbash

# Install Docker
curl -fsSL https://get.docker.com | sh && sudo usermod -aG docker $USER

# Run Open WebUI — auto-connects to Ollama on the host
docker run -d \
  --name open-webui \
  --restart always \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

# Open WebUI is now at http://YOUR-SERVER-IP:3000
# First account you create becomes the admin

Step 3: Remote Access with Tailscale

Tailscale creates a WireGuard-encrypted mesh network between your devices so your phone and laptop can reach the home server securely from anywhere — without opening any router ports. Free for personal use (up to 3 users, 100 devices).

terminalbash

# On the server: install and enable Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# Find the server's Tailscale IP (looks like 100.64.X.X)
tailscale ip -4

# Install Tailscale on any other device (laptop, phone) via tailscale.com
# Sign in with the same account — devices can now reach each other
# Access Open WebUI remotely at: http://100.64.X.X:3000

Securing It: What NOT to Do

▸Do NOT expose port 11434 or 3000 directly to the internet — Ollama has no authentication by default
▸Do NOT use router port forwarding for these services — use Tailscale or a properly authenticated reverse proxy
▸DO use a strong password on your Open WebUI admin account
▸DO keep Ollama and Open WebUI updated — both push regular security patches

Production-Ready Docker Compose

docker-compose.ymlyaml

version: "3.8"
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: always
    ports:
      - "3000:8080"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - open-webui:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_SECRET_KEY=change-this-to-a-random-string

volumes:
  open-webui:

Which Models to Preload

▸12GB VRAM — Preload one 7B-8B model (Llama 3.1 8B or Qwen3.5-9B at Q4). Load 13B on demand.
▸24GB VRAM — Keep two models warm: a fast 8B for quick tasks and a 32B or 70B at Q3 for deeper work.
▸CPU-only mini PC — Preload a single 7B at Q4. Slower but always-on beats nothing.

Integrating With Your Other Tools

▸Continue.dev (VS Code) — Set base_url to http://YOUR-SERVER-IP:11434/v1. All autocomplete and chat go local, zero API cost.
▸Aider — Use --openai-api-base http://YOUR-SERVER-IP:11434/v1 for a full local code editing agent.
▸AnythingLLM — Connect to your Ollama server for a private RAG system over your own documents.
▸Home Assistant — Community integrations connect HA to Ollama for local voice AI and automation scripting.

The Real Cost Comparison

Annual AI Cost: Cloud Subscriptions vs Home Server

ChatGPT Plus × 3 people

720 USD/yr

Claude Pro × 3 people

900 USD/yr

Developer API usage

1800 USD/yr

RTX 3090 server (one-time)

850 USD/yr

Electricity (~8h/day, full load)

120 USD/yr

The RTX 3090 server pays for itself in under eight months compared to three ChatGPT Plus subscriptions. After that your marginal cost is electricity — roughly $120 a year. The savings scale dramatically for anyone previously paying developer API rates at $50-200 per person per month.

Find out which models are right for your home server hardware — VRAM, speed, and recommended quantization included.

Open the VRAM Calculator → →

March 18, 2026

Try Runyard

Find AI models that fit your exact hardware. Enter your specs and get a ranked list instantly.

Newsletter

Turn Any PC Into a Home AI Server: The Complete 2026 Setup Guide

Why a Dedicated Server Beats Running Ollama on Your Workstation

What Hardware You Actually Need

The Three-Component Software Stack

Step 1: Install and Configure Ollama for Network Access

Step 2: Install Open WebUI with Docker

Step 3: Remote Access with Tailscale

Securing It: What NOT to Do

Production-Ready Docker Compose

Which Models to Preload

Integrating With Your Other Tools

The Real Cost Comparison

How Much VRAM Do You Need to Run Local LLMs?

Best Local LLMs for Coding in 2026

Ollama vs LM Studio: Which Should You Use in 2026?