FPL AI Predictor
Full-stack ML application predicting Fantasy Premier League player points and providing AI-driven gameweek strategy. XGBoost model on per-gameweek rolling averages, SGDRegressor for incremental online updates.
Role
ML Engineer & Full-stack Developer
Team
Solo
Company/Organization
Personal Project
The Problem
FPL managers making weekly captain picks, transfers, and chip decisions rely on manual fixture analysis and community opinion — no data-driven...
The official FPL app shows raw stats but no next-gameweek point predictions, no fixture-adjusted recommendations, and no risk profiling for captain...
Predicting FPL points required per-gameweek feature engineering (rolling averages, form, FDR, home/away) from the official FPL API, which returns...
Incremental model updates after each gameweek — incorporating the latest results without full retraining — required an online learning approach...
Captain selection involves multi-factor reasoning: predicted points, fixture difficulty, home/away advantage, rotation risk, opponent defensive...
Chip timing (wildcard, bench boost, triple captain, free hit) depends on fixture swings, squad state, and transfer debt — no open-source FPL tool...
The Solution
Built a full ML pipeline from live data ingestion to interactive React dashboard with a multi-layer decision engine.
Data Pipeline (data_pipeline.py)
Fetches live FPL data from three official API endpoints: bootstrap-static (all players, teams, gameweek info), fixtures (full season schedule with...
Feature engineering per player per gameweek: rolling 3-GW and 5-GW average points, rolling average minutes, rolling average goals/assists/clean...
Outputs structured CSV datasets for training.
ML Models
XGBoost (train_model.py) — primary prediction model. Features: rolling averages (3-GW, 5-GW points/minutes), FDR, home/away, position one-hot...
SGDRegressor (incremental_trainer.py) — online learning model for incremental updates. After each gameweek, new results are used to `partial_fit`...
Prediction blending — final next-GW prediction blends XGBoost (primary) and SGDRegressor (secondary) outputs weighted by recency.
Scoring Engine (scoring_engine.py)
Implements official FPL 2025/26 scoring rules: - Goals: GK/DEF = 6pts, MID = 5pts, FWD = 4pts - GK goals = 10pts (new 2025/26 rule) - Clean sheets:...
Decision Engine (decision_engine.py)
Fixture-adjusted predictions — multiply raw model prediction by FDR adjustment factor (FDR 1 = ×1.2, FDR 5 = ×0.7).
Rotation penalty — players with <60 min rolling average or flagged as rotation risks have predictions scaled down.
Captain/VC ranking — score each outfield player by: fixture-adjusted prediction × form multiplier × home advantage × historical haul frequency....
Transfer simulation — for each proposed transfer (out, in, cost), compute expected points delta over next 3 GWs vs. -4pt hit cost. Recommend...
Chip recommendations — evaluate chip value:
Wildcard: trigger if squad expected points (next 3 GWs) < league average by threshold
Bench Boost: trigger if bench players have high expected points (double GW or strong bench fixtures)
Triple Captain: trigger if top captain candidate has exceptional fixture (FDR 1, home, great form)
Free Hit: trigger on blank/double gameweeks affecting >3 squad players
Wildcard/Free Hit builder — when wildcard or free hit is recommended, builds optimal 15-player squad from scratch: enumerate top predicted...
Monte Carlo Simulation (monte_carlo.py)
1,000-run simulation of team total points for the upcoming gameweek.
Each run: sample each player's points from a distribution (mean = model prediction, std = historical variance for that player/position).
Apply starting XI selection (highest expected points, formation constraints).
Track captain selection: each run picks the player who scored highest as captain, doubles their score.
Output: score distribution histogram, P(score > 60), P(captain haul > 15), P(captain blank < 3).
Live Data Service (live_data_service.py)
Background thread refreshing FPL API data every 10 minutes.
Caches bootstrap-static and fixtures in memory.
Exposes current GW number, deadline, all player stats (injuries, news, availability, price), fixture list.
No external API key required — FPL API is public.
Backend API (api.py — FastAPI, 5 endpoints)
`GET /health` — health check.
`GET /status` — current GW number, deadline timestamp, all fixtures, team names.
`GET /players` — all 600+ players with live stats (form, price, injuries, news, predicted points).
`POST /analyze-squad` — accepts 15 player IDs + ITB + free transfers + chips available. Returns: optimal starting XI, bench order, captain +...
`POST /optimize-team` — full strategy report: all captain candidates ranked, all transfer options evaluated, chip timing analysis, and (if...
Frontend (React 19 + Vite + Tailwind CSS v4)
TeamBuilder.jsx — squad input page: search and add players by name, set ITB and free transfers, select available chips. Validates squad (15...
Analysis.jsx — full AI analysis dashboard: captain recommendation card, transfer suggestion, chip advice, Monte Carlo chart, fixture difficulty...
Pitch.jsx — interactive football pitch with drag-and-drop player positions (@hello-pangea/dnd). Shows injury flags, FDR badges, predicted points,...
PredictionChart.jsx — Recharts bar chart of predicted points for all starting XI players.
CaptainChart.jsx — Recharts chart ranking top captain candidates by fixture-adjusted prediction score.
MonteCarloChart.jsx — Recharts area chart showing 1,000-run score distribution with haul/blank probability markers.
TransferImpact.jsx — shows recommended transfer (out → in), expected delta over 3 GWs, hit cost if applicable.
FixtureDifficulty.jsx — colour-coded FDR badges (green = easy, red = hard) for next 5 gameweeks.
Testing (176 unit tests, 16 test files)
CI-safe: all tests run without a trained model (mock model fixtures).
Coverage: - data_pipeline.py: feature engineering, rolling averages, FDR assignment - scoring_engine.py: all 2025/26 scoring rule combinations (GK...
Design Decisions
XGBoost as primary model over LinearRegression — handles non-linear feature interactions (e.g., FDR × form × home/away) better than linear models....
SGDRegressor for incremental updates — avoids full retraining after each gameweek. `partial_fit` updates model weights from new results in seconds,...
Decision engine as a separate module (decision_engine.py) — keeps strategy logic (captain ranking, transfer evaluation, chip timing, wildcard...
Monte Carlo simulation over deterministic point ranges — captures the variance of FPL scoring (a player can score 2 or 20 points with the same...
Live FPL API with background refresh thread — the official FPL API is public and returns all player/fixture data in two endpoints. Background thread...
Greedy wildcard builder over integer programming — sorting by predicted pts / price ratio and applying FPL constraints greedily is fast (~ms) and...
176 CI-safe unit tests with mock model — all tests run without a trained model.pkl file (mocked via unittest.mock). Enables CI to validate logic...
Two GitHub Actions workflows (CI + CD) — CI on every push/PR (lint + tests + build), CD on main/master (full build + artifact upload). Separation...
Recharts over Chart.js/D3 — React-native charting with composable components, good TypeScript support, and sufficient for the 5 chart types needed...
@hello-pangea/dnd for pitch drag-and-drop — maintained fork of react-beautiful-dnd, actively supported for React 18+. Provides accessible...
Tradeoffs & Constraints
Model accuracy ceiling — XGBoost on rolling averages predicts average form well but can't account for unexpected events (injuries in warm-up,...
model.pkl gitignored — model must be generated locally via `make pipeline` before the app runs. Adds a 2-minute setup step. Production deployment...
Greedy wildcard builder — near-optimal but not guaranteed optimal. A full integer linear program (pulp or ortools) would guarantee the optimal squad...
FPL API rate limits — the official FPL API has undocumented rate limits (~10 req/min per IP). The 10-minute background refresh TTL keeps requests...
Monte Carlo variance — 1,000 runs provides stable probability estimates but adds ~200ms to /optimize-team response time. Could be reduced to 500 runs...
SGDRegressor as secondary model only — SGD is sensitive to feature scaling and learning rate. Used as a blend component (not primary) to avoid...
No persistent database — player predictions and historical analysis stored in-memory and in CSV files. Adding PostgreSQL/Firestore would enable...
Would improve: Add LSTM/time-series model for better form capture, implement per-user squad persistence (PostgreSQL), add Slack/WhatsApp alerts for...
Outcome & Impact
Full-stack FPL analytics platform with XGBoost predictions, incremental SGDRegressor updates, decision engine, Monte Carlo risk profiling, and React...
XGBoost model trained on per-gameweek rolling averages (3-GW, 5-GW points/minutes, FDR, home/away, position, team strength) predicts next-GW points...
SGDRegressor incremental trainer updates model weights after each gameweek via partial_fit — adapts to in-season form changes without full...
Decision engine covers complete FPL weekly workflow: fixture-adjusted predictions, rotation penalties, captain/VC ranking with confidence scores,...
1,000-run Monte Carlo simulation produces team score distribution with haul/blank probability markers — quantifies captain upside and blank gameweek...
Live FPL API integration with 10-minute background refresh — all 600+ players with current form, price, injuries, news, and fixture list available...
React 19 interactive pitch with drag-and-drop player positioning, injury flags, FDR badges, deadline countdown, and fixture difficulty strip — ...
5 FastAPI endpoints cover the full workflow: health, status, player list, squad analysis (starting XI, bench, captain, transfers, chips, Monte...
176 pytest unit tests across 16 test files validate all business logic without requiring a trained model — CI runs on every push in ~30s.
GitHub Actions CI/CD: ruff lint + 176 unit tests + frontend Vite build on every push/PR; full build + artifact upload on main/master.
Tech Stack
ML: XGBoost (primary prediction model), scikit-learn LinearRegression + SGDRegressor (incremental online learning), joblib (model serialisation)
Feature Engineering: pandas (per-GW rolling averages, FDR, home/away, form), numpy (numerical operations)
Live Data: Official FPL API (bootstrap-static, fixtures, element-summary — public, no auth required), background refresh thread (10-min TTL)
Decision Engine: Custom Python — fixture-adjusted predictions, rotation penalties, captain/VC ranking, transfer simulation, chip timing, greedy wildcard builder
Simulation: Monte Carlo (1,000-run team score distribution, haul/blank risk profiling)
Backend: FastAPI (5 endpoints), Uvicorn, Python 3.11
Frontend: React 19, Vite, Tailwind CSS v4, Recharts (bar/area/scatter charts), @hello-pangea/dnd (pitch drag-and-drop), Axios
Testing: pytest (176 unit tests, 16 test files, CI-safe with mock model fixtures), unittest.mock
CI/CD: GitHub Actions — CI (ruff lint + pytest + frontend build on every push/PR), CD (full build + artifact upload on main/master)
Automation: Makefile (setup, pipeline, dev, backend, frontend, test, test-all, lint, build-frontend, clean, all)