Engineering

CPU, not cloud

2026-05-25 · Qovaryx Team

Every architectural decision in Qovaryx was made in service of one constraint: the model has to run on your CPU, on your laptop, locally. Not "could in theory run." Not "runs with a Pro subscription to our cloud." Runs.

Why this is the right constraint

Latency. A 5ms cloud round-trip is forever in options. Local inference is sub-millisecond.
Cost. If every signal cost us API tokens, our pricing would have to reflect it. CPU-local is free at the margin.
Privacy. Your charts, your positions, your decisions never touch our infrastructure.
Reliability. Our cloud being down doesn't stop you trading.

What it forced us to give up

Honest answer: depth. A massive transformer would have more raw capacity. We don't have that.

What we have instead is a cluster of small heads that, between them, cover the dimensions that matter for the task. The aggregate parameter count is a fraction of a frontier LLM, but the task-specific surface is dense.

How we made it fit

We won't write the recipe here — that's the part that took 18 months. The shape:

HGB classifiers (joblib, scikit-learn) for the chart heads. ~27MB total, no GPU.
Compact NN specialists for the contextual heads (macro, news, veto). Optional opt-in; CPU-only by default.
Feature engineering done in pure Python; no PyTorch on the hot path.
Cached macro features (SPY/VXX/etc. snapshots) with a 15-min TTL.
Lazy head loading — heads page in on first use.

What it looks like at runtime

The Trading Engine card in the app shows CPU: Ryzen 7 7700 (44%) and 54.9 / 63.0 GB RAM. No GPU row, no GPU dependency, no "spinning up inference" delay. The first chart scored fires within milliseconds of pressing send.

Cloud AI is what you do when the model is too big for the machine. We made the machine right by making the model small.