Teams
League Season Team
xG Table
# Team League MP Goals xG G−xG xGA xGA Diff xG/Poss xGA/Poss xG/Shot Quality
Loading…
Shots
Goals
xG
G−xG
xG/Shot
Player Rankings
# Player Team League Shots Goals xG Goals−xG xG/Shot
Loading…
Recent Matches
Date League Home Score Away Home xG Away xG xG Diff
Loading…
Show Legend ● Goal   ○ No goal
Type
xG 0.00 1.00
GOAL
Loading…

What is this?

An Expected Goals (xG) dashboard for the top 5 European football leagues — Premier League, La Liga, Bundesliga, Serie A, and Ligue 1. It pulls shot-by-shot data from Understat, trains a custom XGBoost model to predict goal probability, and presents the results as interactive team, player, and match analytics.

The model is trained on historical data (2020–2024) and scored against a 2025 holdout set, where it achieves a Brier score of 0.074 — comparable to Understat's own model at 0.072.

Data Pipeline

Fetch from Understat
For each league and season (2020–present), we use the understat Python library to pull every match result, then fetch individual shot data per match via get_match_shots(). Each shot record includes position (x, y), shot type, situation, last action, and whether it resulted in a goal. ~250k shots total across 5 leagues and 6 seasons.
Feature Engineering
Raw coordinates are transformed into 24 model features covering shot geometry (distance, angle, their squares and interactions), shot type flags (header, cross, rebound, throughball), interaction terms (header × distance, cross × distance), zone classification, and contextual proxies like fast_break — which flags transition shots preceded by a take-on, ball recovery, or throughball.
Per-Situation Model Training
Rather than one global model, shots are routed to specialist XGBoost models based on their situation. Each specialist uses a tailored feature set and undergoes independent hyperparameter search (GridSearchCV, 3-fold CV, optimising Brier score). This means corner specialists focus on header mechanics while set-piece models focus on positional geometry.
Calibration
Each specialist is wrapped in CalibratedClassifierCV(method="isotonic", cv=5) to ensure predicted probabilities are well-calibrated — a model predicting 0.30 xG should see goals ~30% of the time. Penalty kicks are assigned a fixed xG of 0.76 rather than being modelled (insufficient variance).
Aggregation & Serving
Scored shots are aggregated into team, player, match timeline, and watchlist parquet files. A FastAPI backend serves these via REST endpoints. The frontend is a single-page vanilla JS application. Auto-refresh runs every 6 hours for the current season.

Specialist Models

Three situation groups currently have dedicated models. Understat doesn't tag counter-attacks as a separate situation (they appear as OpenPlay), so counter context is captured via the fast_break feature instead.

  • OpenPlay — Full 24-feature set. Deeper trees (max_depth up to 5). Covers the vast majority of shots (~74%). Includes throughball, rebound and transition proxies.
  • FromCorner — 13 features focused on header mechanics: header × distance, header × angle, centrality, weak-angle header flag. Corners are almost exclusively aerial situations; removing irrelevant features reduces overfitting.
  • SetPiece — Covers SetPiece (indirect free kicks) and DirectFreekick. 13 features focused on distance, angle and shot type. No transition features — set pieces are stationary.

The Watch List

Rather than a static season table, the Watch List is a rolling 30-day form guide. For each player with ≥2 goals in the last 30 days, it computes a 0–100 rating from three components:

  • Efficiency (40%) — Goals ÷ xG, capped at 2.0× to prevent small-sample flukes from dominating. A rating above 1.0 means the player is outperforming their chances.
  • Shot Quality (35%) — Mean xG per shot, capped at 0.50. Rewards players getting into high-value positions, not just high-volume shooters.
  • Volume (25%) — Log-scaled shot count. Ensures consistent creators score higher than one-hit wonders, without letting shot-count alone dominate.

Each component is normalised 0–1 across all qualifying players before blending, then the composite score is scaled to 0–100. Arrows show movement vs the prior 30-day window — hover for previous period stats.

Known Limitations

  • No defensive context — Understat doesn't expose freeze-frame data (defender positions). This is likely the main reason our AUC (0.792) slightly trails Understat (0.805).
  • No goalkeeper data — Shot-stopping quality isn't modelled. A shot at an elite keeper has the same xG as one at a weaker keeper.
  • Understat coordinates are from event data, not tracking data, so positional accuracy is limited to the data source.
  • Counter-attack proxy — Understat doesn't provide a counter-attack flag. We approximate with last-action context (TakeOn, BallRecovery, Throughball).