This weekend, tinygrad's tinybox surged to the top of Hacker News with 431 upvotes and over 260 comments. The reason: tinygrad - the neural network framework built by George Hotz, the engineer who first jailbroke the iPhone - is now shipping a compact AI compute box that can run models at the 120B parameter scale entirely offline, no cloud required. The tinybox red v2 starts at $12,000 and ships within a week. The tinybox green v2 packs 384GB of GPU RAM and 3,086 TFLOPS of FP16 compute for $65,000.
The tinybox is not a toy. It benchmarks in MLPerf Training 4.0 against machines costing ten times as much. Both models ship with Ubuntu 24.04, full network connectivity, and a BIOS management interface. The tinygrad framework that powers it is already used in production: it runs the driving model inside Comma.ai's openpilot, one of the most widely deployed open-source autonomous driving systems in the world.
What makes this moment significant is the price point. Until recently, running a frontier-class open-source model on dedicated hardware required either a hyperscaler account or a six-figure investment in enterprise GPU infrastructure. The tinybox changes that calculus. For the price of a senior hire's first three months, a company can own the hardware and run Llama 3, Mistral, or Qwen models with no API costs, no vendor lock-in, and no data leaving the building.
For European enterprises, this matters for reasons that go beyond cost. The EU AI Act, GDPR, and a growing set of sector-specific regulations place strict requirements on where and how personal data is processed by automated systems. When a Dutch logistics company sends invoice data or employee records through an American cloud AI API, it is transmitting data under US law, to US infrastructure, processed by systems that may be used to improve models in ways the European customer never agreed to.
Legal teams are increasingly unwilling to accept this risk. Enterprise AI agreements from OpenAI and Anthropic include zero-retention provisions, but these require separate negotiation, separate pricing, and a degree of trust in contractual enforcement across jurisdictions. A server in your own data center requires none of that trust. The data never leaves.
There is also the cost curve to consider. Cloud AI inference pricing has dropped dramatically over the past two years, but it is still consumption-based: the more you use, the more you pay. At high volume, this becomes a significant operating expense. A self-hosted model running on owned hardware converts that variable cost into a fixed capital investment. For organizations running tens of thousands of document processing queries per day, the economics shift quickly.
At Laava, we have been deploying open-source models in sovereign configurations since we started. Llama 3 and Mistral run in client VPCs and on-premise servers for clients in financial services and legal sectors, where the alternative - sending sensitive documents through a cloud API - is simply not an option their legal teams will sign off on.
What hardware like the tinybox changes is the barrier to entry for that architecture. Historically, on-premise AI inference required either a significant capital commitment for dedicated GPU servers, or accepting the limitations of smaller models that could run on CPU or consumer-grade hardware. The gap between what you could run locally and what the cloud offered was large. That gap is closing fast.
This does not mean every organization should rush to buy a tinybox. The hardware is one piece. Running open-source models in production requires more than compute: you need a properly structured RAG pipeline, metadata governance, guardrails, evaluation pipelines, and integration with the systems where work actually happens - ERP, CRM, email. A tinybox running a hallucinating Llama instance connected to nothing is not an AI agent. It is an expensive server.
If your organization is processing sensitive documents at scale - invoices, contracts, customer communications, internal policies - and your legal team is uncomfortable with cloud AI, this is the right moment to evaluate a sovereign AI architecture. The models are good enough. The hardware cost is now accessible. The remaining question is whether you have the systems engineering to deploy it correctly.
Laava runs a free 90-minute Roadmap Session where we assess whether a sovereign AI deployment makes sense for your process, your compliance requirements, and your data volumes. We will tell you honestly if the cloud is the better choice. If it is not, we build the architecture to run on your infrastructure, with your data, under your control.