I wanted a small, low-power PC for running local AI models. Always-on, private, sovereign - no telemetry, no training data contributions, no API dependencies for the workloads that matter most.
The journey to finding the right machine was more interesting than I expected, mostly because the obvious choices turned out to be wrong and the right answer was hiding in plain sight.
Why Go Local?
AI tools have been a part of my workflow since 2021. It started with GitHub Copilot, then to ChatGPT, Cursor, Claude, Claude Code, Gemini etc... In late 2025 I think we reached a technological tipping point. These were not just curiosities anymore and where generally useful.
However, there are still a number of barriers:
Cost: It's no secret that token cost is currently heavily subsidized, but with rumours of an Anthropic and OpenAI IPO on the horizon I'm bracing myself for a steep price increase when they need to start being profitable. As a Claude user, I've also been burned by now not being able to use OpenCode or Pi with my plan and instead having to purchase "Extra Usage".
Limits: It's not just the price that's an issue, it's the usage limits too. Since Claude's usage limit changes in March 2026 I'm now getting frozen out much more often. The fix? Move from a £18/mo Pro plan to a £100/mo Max plan, or have Claude talk like a caveman to conserve tokens
Privacy: I don't want my chats and code being used to train models without my consent. Furthermore, I'm willing to allow agents to work with my notes, Home Assistant install, limited financial information etc... However, to do so needs better privacy guarantees and safe-guards around what is shared. That's data that under no circumstances I would want to be part of someone's training data.
Geo-politics: I do not want to rely heavily on tools that could be removed with no notice and that I would have no viable replacement. Given the volatility of the geo-political situation right now, I'm concerned about export restrictions on AI. We've already seen NVIDIA no longer selling GPUs in China, Anthropic being declared a supply chain risk by the US government, and of course the secretive "Project Glasswing" because Anthropic released something they claimed was too dangerous for the general public.
The Use Case
Given the above concerns I needed:
- Cost Control: A more predictable $/million tokens. This means moving workloads away from Anthropic, and towards offerings from Mistral, Z.ai, Moonshot.ai and Qwen. This allows me to experiment with OpenCode/Pi etc... at a fixed monthly cost. I can also pick between different models for different tasks.
- Limitless: Always-on, always available. When I'm out of credits, I still need to be able to get work done.
- Data sovereignty. I have workloads involving personal health data, private notes, and internal code that I don't want flowing through third-party APIs.
I realised that what I was looking for was something that could run local inference. I could use OpenRouter when a frontier model was required or a larger parameter count was necessary but also have the option to use a local 30B parameter model to save API costs, or to keep data on my own hardware.
What I needed was a machine that could handle sustained throughput at low idle power, not peak benchmark scores. Memory bandwidth matters more than raw compute for LLM inference, and 128GB of unified memory is the floor for running two 30B models simultaneously with comfortable context windows.
The Contenders
Given my requirements, there are only 3 contenders:
- An NVIDIA DGX Spark.
- An Apple Mac Studio.
- An AMD Ryzen AI Max+ 395 ("Strix Halo") machine
Here's what the UK market actually looks like when you add VAT, shipping, import duties, and storage to get a real landed price.
GMKtec EVO-X2 - £2,660 (128GB) / £1,999 (96GB)
The cheapest 128GB Strix Halo box available from Amazon UK. Comes with 2TB SSD included. The 96GB variant at £1,999 is genuinely tempting and for workloads that fit in 96GB, it's hard to argue against. However, I need 128GB, and for that I need to pay a £660 premium - nearly double the cost of an equivalent 32GB SODIMM.
Minisforum MS-S1 MAX - £2,719 (128GB)
Dearer than the GMKtec, but with identical specs. Out of stock at the time I was looking.
Beelink GTR9 Pro - £2,400 - £3000 (128GB, imported)
Beelink's flagship Strix Halo box looks competitive at $3,299 USD. While shipping was free, it's hard to say whether or not customs duties and VAT would be payable. In which case, it would be pricier than all other options so far.
Framework Desktop - £3,326+ (128GB, fully configured)
This is where the story gets interesting, and where I nearly made an expensive mistake.
Framework's base price for the AI Max+ 395 128GB mainboard is £2,999. That's already above all other options we've looked at so far but Framework's ethos - repairable, upgradeable, open, Linux-first - is genuinely appealing.
Then you configure it and the £2,999 DIY edition shoots to £3326 once you add a Noctua fan kit, two expansion cards, tile packs, and a 2TB NVMe. And for what? The same Strix Halo silicon as every other box on this list.
While I'd love to support Framework, I think they're charging a premium here.
Mac Studio M4 Max - £4,199 (128GB)
The elephant in the room. Apple's M4 Max has ~540 GB/s memory bandwidth - roughly double Strix Halo's ~256 GB/s. That translates to approximately 2–3× faster inference on the same models. For a 70B model at Q4, Mac Studio owners report 22–25 tok/s where Strix Halo gets 6–8. The performance gap is real and significant.
But at £4,199 - nearly double the Strix Halo options - the bandwidth premium has to justify itself against the constraints. For me, the dealbreakers were macOS (my entire stack is Linux), the difficulty of running it headless as a server, and the price. If your stack is already macOS-native and you don't mind the cost, the Mac Studio is genuinely the better inference machine. For a Linux homelab, it's the wrong tool.
NVIDIA DGX Spark - ~£3,200 (128GB)
I nearly fell for this one. The DGX Spark has 128GB of unified LPDDR5X at ~273 GB/s - slightly faster bandwidth than Strix Halo. Plus: it's NVIDIA, it's CUDA, it's the "serious AI hardware" brand.
The reality: ~273 GB/s is essentially the same bandwidth tier as Strix Halo's ~256 GB/s. The DGX Spark's real advantage is FP4 throughput for fine-tuning and training. If you're doing inference only - which I am - you're paying an extra £1,000+ for capabilities you'll never use. The CUDA compatibility is nice in theory, but llama.cpp's ROCm backend on Strix Halo performs well enough anyway, so the "CUDA advantage" evaporates for the actual software stack most people are running.
The Winner: Corsair AI Workstation 300 - £1,999.99
And then, at midnight on a Tuesday, tired from hours of comparison shopping and having nearly bought the wrong machine twice, I found the Corsair AI Workstation 300 on Corsair's UK site.
- AMD Ryzen AI Max+ 395 (full-fat, identical silicon to every other Strix Halo box)
- 128GB LPDDR5X-8000 unified memory
- 1TB M.2 NVMe SSD (two M.2 slots, add a second drive later)
- 4.4L SFF chassis with dual-fan cooling
- 300W Flex ATX PSU (internal, not a brick)
- 2-year warranty + lifetime tech support from Corsair
- £1,999.99, in stock, ships in 2 days
Read that price again. £1,999.99 for a 128GB Strix Halo box from a tier-1 vendor with a real UK warranty. The only trade-off: 1TB storage instead of 2TB. While NVMe pricing has gone the same way as DRAM recently, 1TB is plenty to start and I can add more storage later.
I genuinely don't understand how Corsair is pricing this - someone there may be getting fired for putting the wrong number on this product page. Whatever the reason, every other 128GB Strix Halo vendor is overcharging by £700+ relative to what Corsair has shown is achievable.
The Verification
Before committing to the Strix Halo purchase, I found several online sources for benchmarks on the models I planned to run. This one was particularly good.
The numbers were strong - 400+ tok/s prefill at 8K context, 60+ tok/s generation on 30B MoE models. Good enough to proceed. After the Corsair box arrived, I built llama.cpp from source with Vulkan support and ran my own benchmarks. (Spoiler: I later tested ROCm 6.2.2 and it was even faster - I'll cover that in a follow-up post.)
Qwen3.5-35B-A3B (MoE, ~3B active params) at Q8_0:
| context | prefill (tok/s) | generation (tok/s) |
|---|---|---|
| 512 | 1,188 | 54 |
| 2,048 | 1,117 | 54 |
| 8,192 | 1,006 | 54 |
| 32,768 | 720 | 61 |
| 65,536 | 498 | 61 |
| 120,000 | 333 | 61 |
For context: 54–61 tok/s generation is faster than comfortable reading speed. Prefill speeds were higher than anticipated also, which is always nice. So overall, this should work pretty well for local inference.
Conclusion
Buy the Corsair. If you're in the UK and want a 128GB Strix Halo box for local AI inference, the Corsair AI Workstation 300 at £1,999.99 is the only answer that makes sense. It's cheaper than every competitor by a large margin, it's from a tier-1 vendor with proper UK warranty and support, and the silicon is identical. The only reason to buy anything else is if Corsair is out of stock (check GMKtec or Minisforum) or if you specifically need Mac Studio's superior memory bandwidth and are willing to pay £2,200 more for it - I am not.
Avoid Framework Desktop at current pricing. I say this as someone who genuinely supports Framework's mission. On a product where the CPU, GPU, and RAM are soldered - where repairability is limited to fans, expansion cards, and decorative tiles - a £1,326 premium over Corsair is not justified. Framework's laptop products, where you can swap the mainboard, screen, keyboard, and battery, make the premium worthwhile but I have no clue what they were thinking with the desktop.
Don't buy the DGX Spark for inference. Same bandwidth tier as Strix Halo, £1,200 more, and its advantages (CUDA, FP4 training) only matter if you're fine-tuning models.
A £2,000 local AI box does not replace frontier API access. It complements it. The models you can run locally are capable - genuinely, impressively capable - but they're not Opus or GPT-5. Over the coming months I'm going to see how much I can delegate to local AI, and how much I end up pushing to bigger models via OpenRouter and I'll report back so you can better understand the ROI.
At current prices, this little box would have to process circa 4 billion tokens to break even, and that's not including energy costs. (Back of napkin: Qwen3.5-35B-A3B on OpenRouter runs $0.125/mtok input and $1.39/mtok output; a 60/40 input/output blend gives a blended rate of ~$0.667/mtok. £2,000 is roughly $2,700 USD at current rates, so $2,700 ÷ $0.667/mtok ≈ 4 billion tokens.) It will, however, give me sovereignty which renders the ROI point moot since it adds a capability that I didn't have before.
In summary, the Corsair AI Workstation 300 is the best entry point for local inference in my opinion. Not because it's the fastest, smallest, most upgradeable, or the most beautiful. Because it's the same silicon as everything else, from a reputable vendor, at a price that makes every competitor look like they're taking the piss.
Sometimes the boring answer is the right one.
Update (2026-04-27): The Corsair AI Workstation 300 has increased to £2,550.99. It seems I was right! The calculus has shifted somewhat - while it's still the most competitive 128GB Strix Halo option in the UK market, it's now holding that title by a very slim margin of ~£100. While the value proposition isn't as clear-cut when I purchased at the beginning of April, he Corsair still holds advantages in warranty, support, and Corsair's reputation for build quality. I'll update my benchmarks and long-term thoughts once I've had a chance to re-evaluate whether this machine is still the right choice at the new price.