← Blog
2026-03-27

Measuring “Precision”

One goal of work here is to create a new measure of player performance based on expected value (EV). Here I aim to never use centipawn deltas, instead relying on expectation deltas (based on result probabilities for all three results), and move prediction, to compute an “expected loss” in each position. Precision is then computed by comparing the player's actual loss (from the moves they really played) to this expected loss. You can find this computed for every analyzed game and match.

Expected EV loss

This is computed by taking a weighted average of the EV deltas for all candidate moves with at least 1% move probability (according to the move prediction model). You can see the outputs of this model while replaying games as the % value shown next to each candidate move in the candidates move list. This weighted average is compared to the eval of the best move.

Actual EV loss

This is the EV loss of the move that was actually played. If the best move is played, it's 0.

Precision

This is computed as 1 - actual EV loss / expected EV loss. Here's the intuition:

  • If a player plays perfectly according to EV, this gives a value of 1. On every move, the player avoided any EV loss, so this resolves to 1 - 0 = 1.
  • If a player plays at a level the move prediction model considers average, this gives a value of 0. The player made about as many mistakes as expected, at least in terms of their total EV loss.
  • If a player plays worse than that, the value goes negative. In theory it can go very negative! If a player chooses a blunder with very high EV loss, and very low move probability, the move's “precision” will be quite low.

Does it work? I hope so! We can build some intuition by looking at World Championship matches.

Precision of World Championship matches

Across WCC matches (excluding first 8 opening moves), precision is predictive, and increasing gaps in precision are more predictive of results. Individual precision correlates with game score (r=+0.337, R²=0.114, N=1,780 player-games), but the precision gap between two players is much more predictive (r=+0.546, R²=0.298, N=890 games):

Looking across time, we see approximately the pattern we'd expect — players getting stronger, and precision going up: