Back to Projects

FPL AI Predictor

Full-stack ML application predicting Fantasy Premier League player points and providing AI-driven gameweek strategy. XGBoost model on per-gameweek rolling averages, SGDRegressor for incremental online updates.

Python 3.11XGBoostscikit-learnSGDRegressorFastAPIpandasnumpyjoblibReact 19ViteTailwind CSS v4RechartspytestGitHub Actions

Role

ML Engineer & Full-stack Developer

Team

Solo

Company/Organization

Personal Project

The Problem

FPL managers making weekly captain picks, transfers, and chip decisions rely on manual fixture analysis and community opinionno data-driven...

The official FPL app shows raw stats but no next-gameweek point predictions, no fixture-adjusted recommendations, and no risk profiling for captain...

Predicting FPL points required per-gameweek feature engineering (rolling averages, form, FDR, home/away) from the official FPL API, which returns...

Incremental model updates after each gameweekincorporating the latest results without full retraining — required an online learning approach...

Captain selection involves multi-factor reasoning: predicted points, fixture difficulty, home/away advantage, rotation risk, opponent defensive...

Chip timing (wildcard, bench boost, triple captain, free hit) depends on fixture swings, squad state, and transfer debtno open-source FPL tool...

The Solution

Built a full ML pipeline from live data ingestion to interactive React dashboard with a multi-layer decision engine.

Data Pipeline (data_pipeline.py)

Fetches live FPL data from three official API endpoints: bootstrap-static (all players, teams, gameweek info), fixtures (full season schedule with...

Feature engineering per player per gameweek: rolling 3-GW and 5-GW average points, rolling average minutes, rolling average goals/assists/clean...

Outputs structured CSV datasets for training.

ML Models

XGBoost (train_model.py) — primary prediction model. Features: rolling averages (3-GW, 5-GW points/minutes), FDR, home/away, position one-hot...

SGDRegressor (incremental_trainer.py) — online learning model for incremental updates. After each gameweek, new results are used to `partial_fit`...

Prediction blending — final next-GW prediction blends XGBoost (primary) and SGDRegressor (secondary) outputs weighted by recency.

Scoring Engine (scoring_engine.py)

Implements official FPL 2025/26 scoring rules: - Goals: GK/DEF = 6pts, MID = 5pts, FWD = 4pts - GK goals = 10pts (new 2025/26 rule) - Clean sheets:...

Decision Engine (decision_engine.py)

Fixture-adjusted predictions — multiply raw model prediction by FDR adjustment factor (FDR 1 = ×1.2, FDR 5 = ×0.7).

Rotation penalty — players with <60 min rolling average or flagged as rotation risks have predictions scaled down.

Captain/VC ranking — score each outfield player by: fixture-adjusted prediction × form multiplier × home advantage × historical haul frequency....

Transfer simulation — for each proposed transfer (out, in, cost), compute expected points delta over next 3 GWs vs. -4pt hit cost. Recommend...

Chip recommendations — evaluate chip value:

Wildcard: trigger if squad expected points (next 3 GWs) < league average by threshold

Bench Boost: trigger if bench players have high expected points (double GW or strong bench fixtures)

Triple Captain: trigger if top captain candidate has exceptional fixture (FDR 1, home, great form)

Free Hit: trigger on blank/double gameweeks affecting >3 squad players

Wildcard/Free Hit builder — when wildcard or free hit is recommended, builds optimal 15-player squad from scratch: enumerate top predicted...

Monte Carlo Simulation (monte_carlo.py)

1,000-run simulation of team total points for the upcoming gameweek.

Each run: sample each player's points from a distribution (mean = model prediction, std = historical variance for that player/position).

Apply starting XI selection (highest expected points, formation constraints).

Track captain selection: each run picks the player who scored highest as captain, doubles their score.

Output: score distribution histogram, P(score > 60), P(captain haul > 15), P(captain blank < 3).

Live Data Service (live_data_service.py)

Background thread refreshing FPL API data every 10 minutes.

Caches bootstrap-static and fixtures in memory.

Exposes current GW number, deadline, all player stats (injuries, news, availability, price), fixture list.

No external API key requiredFPL API is public.

Backend API (api.py — FastAPI, 5 endpoints)

`GET /health`health check.

`GET /status`current GW number, deadline timestamp, all fixtures, team names.

`GET /players`all 600+ players with live stats (form, price, injuries, news, predicted points).

`POST /analyze-squad`accepts 15 player IDs + ITB + free transfers + chips available. Returns: optimal starting XI, bench order, captain +...

`POST /optimize-team`full strategy report: all captain candidates ranked, all transfer options evaluated, chip timing analysis, and (if...

Frontend (React 19 + Vite + Tailwind CSS v4)

TeamBuilder.jsx — squad input page: search and add players by name, set ITB and free transfers, select available chips. Validates squad (15...

Analysis.jsx — full AI analysis dashboard: captain recommendation card, transfer suggestion, chip advice, Monte Carlo chart, fixture difficulty...

Pitch.jsx — interactive football pitch with drag-and-drop player positions (@hello-pangea/dnd). Shows injury flags, FDR badges, predicted points,...

PredictionChart.jsx — Recharts bar chart of predicted points for all starting XI players.

CaptainChart.jsx — Recharts chart ranking top captain candidates by fixture-adjusted prediction score.

MonteCarloChart.jsx — Recharts area chart showing 1,000-run score distribution with haul/blank probability markers.

TransferImpact.jsx — shows recommended transfer (out → in), expected delta over 3 GWs, hit cost if applicable.

FixtureDifficulty.jsx — colour-coded FDR badges (green = easy, red = hard) for next 5 gameweeks.

Testing (176 unit tests, 16 test files)

CI-safe: all tests run without a trained model (mock model fixtures).

Coverage: - data_pipeline.py: feature engineering, rolling averages, FDR assignment - scoring_engine.py: all 2025/26 scoring rule combinations (GK...

Design Decisions

XGBoost as primary model over LinearRegressionhandles non-linear feature interactions (e.g., FDR × form × home/away) better than linear models....

SGDRegressor for incremental updatesavoids full retraining after each gameweek. `partial_fit` updates model weights from new results in seconds,...

Decision engine as a separate module (decision_engine.py)keeps strategy logic (captain ranking, transfer evaluation, chip timing, wildcard...

Monte Carlo simulation over deterministic point rangescaptures the variance of FPL scoring (a player can score 2 or 20 points with the same...

Live FPL API with background refresh threadthe official FPL API is public and returns all player/fixture data in two endpoints. Background thread...

Greedy wildcard builder over integer programmingsorting by predicted pts / price ratio and applying FPL constraints greedily is fast (~ms) and...

176 CI-safe unit tests with mock modelall tests run without a trained model.pkl file (mocked via unittest.mock). Enables CI to validate logic...

Two GitHub Actions workflows (CI + CD)CI on every push/PR (lint + tests + build), CD on main/master (full build + artifact upload). Separation...

Recharts over Chart.js/D3React-native charting with composable components, good TypeScript support, and sufficient for the 5 chart types needed...

@hello-pangea/dnd for pitch drag-and-dropmaintained fork of react-beautiful-dnd, actively supported for React 18+. Provides accessible...

Tradeoffs & Constraints

Model accuracy ceilingXGBoost on rolling averages predicts average form well but can't account for unexpected events (injuries in warm-up,...

model.pkl gitignoredmodel must be generated locally via `make pipeline` before the app runs. Adds a 2-minute setup step. Production deployment...

Greedy wildcard buildernear-optimal but not guaranteed optimal. A full integer linear program (pulp or ortools) would guarantee the optimal squad...

FPL API rate limitsthe official FPL API has undocumented rate limits (~10 req/min per IP). The 10-minute background refresh TTL keeps requests...

Monte Carlo variance1,000 runs provides stable probability estimates but adds ~200ms to /optimize-team response time. Could be reduced to 500 runs...

SGDRegressor as secondary model onlySGD is sensitive to feature scaling and learning rate. Used as a blend component (not primary) to avoid...

No persistent databaseplayer predictions and historical analysis stored in-memory and in CSV files. Adding PostgreSQL/Firestore would enable...

Would improve: Add LSTM/time-series model for better form capture, implement per-user squad persistence (PostgreSQL), add Slack/WhatsApp alerts for...

Outcome & Impact

Full-stack FPL analytics platform with XGBoost predictions, incremental SGDRegressor updates, decision engine, Monte Carlo risk profiling, and React...

XGBoost model trained on per-gameweek rolling averages (3-GW, 5-GW points/minutes, FDR, home/away, position, team strength) predicts next-GW points...

SGDRegressor incremental trainer updates model weights after each gameweek via partial_fitadapts to in-season form changes without full...

Decision engine covers complete FPL weekly workflow: fixture-adjusted predictions, rotation penalties, captain/VC ranking with confidence scores,...

1,000-run Monte Carlo simulation produces team score distribution with haul/blank probability markersquantifies captain upside and blank gameweek...

Live FPL API integration with 10-minute background refreshall 600+ players with current form, price, injuries, news, and fixture list available...

React 19 interactive pitch with drag-and-drop player positioning, injury flags, FDR badges, deadline countdown, and fixture difficulty strip...

5 FastAPI endpoints cover the full workflow: health, status, player list, squad analysis (starting XI, bench, captain, transfers, chips, Monte...

176 pytest unit tests across 16 test files validate all business logic without requiring a trained modelCI runs on every push in ~30s.

GitHub Actions CI/CD: ruff lint + 176 unit tests + frontend Vite build on every push/PR; full build + artifact upload on main/master.

Tech Stack

ML: XGBoost (primary prediction model), scikit-learn LinearRegression + SGDRegressor (incremental online learning), joblib (model serialisation)

Feature Engineering: pandas (per-GW rolling averages, FDR, home/away, form), numpy (numerical operations)

Live Data: Official FPL API (bootstrap-static, fixtures, element-summarypublic, no auth required), background refresh thread (10-min TTL)

Decision Engine: Custom Pythonfixture-adjusted predictions, rotation penalties, captain/VC ranking, transfer simulation, chip timing, greedy wildcard builder

Simulation: Monte Carlo (1,000-run team score distribution, haul/blank risk profiling)

Backend: FastAPI (5 endpoints), Uvicorn, Python 3.11

Frontend: React 19, Vite, Tailwind CSS v4, Recharts (bar/area/scatter charts), @hello-pangea/dnd (pitch drag-and-drop), Axios

Testing: pytest (176 unit tests, 16 test files, CI-safe with mock model fixtures), unittest.mock

CI/CD: GitHub ActionsCI (ruff lint + pytest + frontend build on every push/PR), CD (full build + artifact upload on main/master)

Automation: Makefile (setup, pipeline, dev, backend, frontend, test, test-all, lint, build-frontend, clean, all)

Back to Projects