RUNYARD.DEV / BLOG
Guides, deep-dives, and news on local AI models, hardware, and the Runyard platform.
InclusionAI just open-sourced Ling-2.6-1T under MIT — a 1T-parameter MoE with 63B active parameters, a 262K context window, and LiveCodeBench scores that beat GPT-5 by 13 points. Here's what it is, how it works, and what hardware you actually need.
Alibaba's newest coding model activates only 3B parameters from 80B total — and still beats models 10–20x larger on SWE-Bench Pro. Here's how to run it locally and why it matters for anyone building with local AI.
MiMo-V2.5 is Xiaomi's fully open-source 310B MoE model with just 15B active parameters per token — built for multimodal agentic coding, long-horizon reasoning, and real-world task completion across text, image, video, and audio.
ChatGPT Plus, Claude Pro, Cursor, Copilot — they're each "only" $10–60/month. But stack a few together and do the 3-year math, and consumer GPU hardware starts looking very smart. Here's the honest cost comparison.
DeepSeek V4 uses only 10% of the KV cache that V3.2 needed at 1M token context — by compressing across tokens rather than heads. Here's a clear breakdown of how Compressed Sparse Attention and Heavily Compressed Attention actually work.
Alibaba's Qwen3.6-27B scores 77.2% on SWE-bench Verified — matching Claude 4.5 Opus on coding tasks — while the sibling 35B-A3B MoE activates just 3B parameters per token. Both run on consumer hardware today.
On March 24, 2026, LiteLLM versions 1.82.7 and 1.82.8 were poisoned with a three-stage credential stealer targeting API keys, SSH keys, cloud credentials, and crypto wallets. Here's exactly what happened and what to do now.