← All research
Method

Calibrated conviction beats raw confidence

2026-06-03 · Qovaryx Team

"The model is 87% confident this is a BUY." Great. Out of all the times it said 87%, how often was it actually right?

That second sentence is the whole game. Without it, the 87% is a vibe. With it, you have a probability you can size to.

What calibration means

A model is calibrated when its stated confidence matches its empirical hit rate. If it says 70% across 1,000 cases, ~700 should resolve as predicted. If it says 95% and only 60% resolve, it's overconfident — and any position sizing tied to that 95% is broken.

Out-of-the-box LLM softmax scores are generally not calibrated. They're optimized for argmax accuracy, not for probability honesty. Treating them as probabilities is a leak.

How we calibrate

We don't ship the raw decoder output. Every probability the cluster surfaces to your app passes through a post-hoc calibration layer trained on holdout outcomes. The shape we use is well-studied; we won't detail which here, because it would let a competitor copy our setup without the boring parts that make it work. The honest version:

Why this maps to position size

If 70% confidence is honestly 70% hit rate, then a Kelly-fraction position size is well-defined. Our tier ladder is conservative on top of that — we cap below full Kelly to survive bad streaks:

The 0.60 floor isn't a guess. It's the point below which the calibration curve gets noisy and the expected value of trading is negative net of slippage.

If your AI tool tells you 80% but won't tell you how often "80%" was right, your position size is being chosen by a marketing department.
Not financial advice. Architecture notes describe what we built, not how to trade. Options trading involves substantial risk of loss.