Hitting 1,000 tokens per second on a single RTX 5090

Aggressive decode optimizations for Qwen3-0.6B on an RTX 5090 GPU

February 9, 2026 · 14 min · 2795 words · AlpinDale