What is this?
An Expected Goals (xG) dashboard for the top 5 European football leagues — Premier League, La Liga, Bundesliga, Serie A, and Ligue 1. It pulls shot-by-shot data from Understat, trains a custom XGBoost model to predict goal probability, and presents the results as interactive team, player, and match analytics.
The model is trained on historical data (2020–2024) and scored against a 2025 holdout set, where it achieves a Brier score of 0.074 — comparable to Understat's own model at 0.072.
Data Pipeline
understat Python library to pull every match result, then fetch individual shot data per match via get_match_shots(). Each shot record includes position (x, y), shot type, situation, last action, and whether it resulted in a goal. ~250k shots total across 5 leagues and 6 seasons.CalibratedClassifierCV(method="isotonic", cv=5) to ensure predicted probabilities are well-calibrated — a model predicting 0.30 xG should see goals ~30% of the time. Penalty kicks are assigned a fixed xG of 0.76 rather than being modelled (insufficient variance).Specialist Models
Three situation groups currently have dedicated models. Understat doesn't tag counter-attacks as a separate situation (they appear as OpenPlay), so counter context is captured via the fast_break feature instead.
- OpenPlay — Full 24-feature set. Deeper trees (max_depth up to 5). Covers the vast majority of shots (~74%). Includes throughball, rebound and transition proxies.
- FromCorner — 13 features focused on header mechanics: header × distance, header × angle, centrality, weak-angle header flag. Corners are almost exclusively aerial situations; removing irrelevant features reduces overfitting.
- SetPiece — Covers SetPiece (indirect free kicks) and DirectFreekick. 13 features focused on distance, angle and shot type. No transition features — set pieces are stationary.
The Watch List
Rather than a static season table, the Watch List is a rolling form guide based on each player's last 5 games. For each player with ≥3 games and ≥3 shots in that window, it computes a 0–100 rating from two outcome-independent components:
- npxG/90 (60%) — Non-penalty xG per 90 minutes, estimated from shot volume. Rewards players generating high-quality chances consistently.
- Shot Quality (40%) — Mean xG per shot, capped at 0.50. Rewards players getting into dangerous positions, not just shooting often.
Each component is normalised 0–1 across all qualifying players before blending, then the composite score is scaled to 0–100. Arrows show movement vs the prior 5-game window — hover for previous period stats. Goals are shown for context but do not affect the rating.
Known Limitations
- No defensive context — Understat doesn't expose freeze-frame data (defender positions). This is likely the main reason our AUC (0.792) slightly trails Understat (0.805).
- No goalkeeper data — Shot-stopping quality isn't modelled. A shot at an elite keeper has the same xG as one at a weaker keeper.
- Understat coordinates are from event data, not tracking data, so positional accuracy is limited to the data source.
- Counter-attack proxy — Understat doesn't provide a counter-attack flag. We approximate with last-action context (TakeOn, BallRecovery, Throughball).