Data / methodology

How the data is collected and aggregated

Source

The figures on footballgpt.co/data are computed from the live FootballGPT database — the queries, animated practices, and conversations that real users (coaches, Football Manager video-game players, and individual players) generate inside the product. No external data sources are used.

What the headline counts mean

The three big numbers at the top of footballgpt.co/data are deliberately precise, so anyone quoting them knows exactly what they count:

Coaches: the number of distinct users who have generated at least one animated practice and have not opted out. One practice is enough to be counted; there is no minimum activity beyond that. It is an all-time distinct count, not a rolling window. Crucially, it is not the number of registered accounts: people who signed up but never generated a practice are not counted, which is why this figure is smaller than total sign-ups.
Animated practices: the total number of non-opted-out practices generated, all-time. A single coach can account for many.
Questions: the total number of mode-tagged queries made by registered users, all-time, across every mode (coach, FM, player, scout, goalkeeper).

These top-line counts are raw totals and do not have the k-anonymity floor applied. That floor (see below) governs the broken-down charts, where a small cohort could otherwise be identifiable. The headline numbers are simple counts of activity, not estimates.

Refresh cadence

Charts and downloads are recomputed once a week, every Monday at 06:00 UTC. The "Updated" date on the homepage reflects that compute time. There is no live or on-demand refresh — the weekly cadence is deliberate, so charts are stable across the working week and so the same numbers appear in any press piece written about a given week.

Anonymisation

All published charts are aggregate. No row-level data leaves the database. No usernames, emails, club names, team names, or session free-text are ever included in the dashboard, the CSV downloads, or the PDF report. Free-text query content is bucketed into pre-defined categories by automated rules; raw queries are never published verbatim.

Every chart enforces a k-anonymity floor of 50: any cohort with fewer than 50 underlying records is hidden, bucketed into "other", or omitted entirely. Where a chart shows percentages, the underlying n is shown alongside.

Reporting layer normalisation

User-stated values like age group and practice category arrive in many forms ("U10", "u10", "Under 10", "10-year-olds", "U10-U12", "Adult", "Senior", "Erwachsene"). The product treats these as flexible inputs, because the AI chatbot can interpret them. The dashboard cannot, so we apply a fixed reporting-layer normaliser:

Age bands: Mini (U6-U9), Junior (U10-U12), Youth (U13-U15), Senior Youth (U16-U18), Adult (U19+), Mixed.
Practice categories: technical, tactical, game-based, set-piece, warm-up, physical, defending, attacking, goalkeeping, ball-mastery, cool-down, other.

Source data is never modified. Normalisation lives only in published views.

Chart 1 — isolated-practice problem

Source: the generated_drills table (the internal name for animated practices), filtered to records with both an age group and a practice category set. Percentages are within each age band, summing to 100 across visible categories.

Chart 2 — planning rhythm

Source: generated_drills.created_at, bucketed into day-of-week and hour-of-day in UTC. Note that the heatmap reflects UTC time, not user-local time — clusters around 8pm UTC correspond to different local times for users in different countries.

Chart 3 — audience mix

Source: query_analytics.mode, which is set per query based on which surface the user is interacting with (coach mode, FM mode, player mode, scout mode, goalkeeper mode). A single user can appear in multiple modes across different sessions.

Chart 4 — age band distribution

Source: generated_drills.age_group, normalised to age bands and grouped. A small share of animated practices (roughly 1%) carry an age_group value that does not map cleanly to a single band ("U10-Adult", "U13+", "All ages") — these are bucketed as "Mixed".

Chart 5 — pitch-third concentration (with coach-intent split)

Source: generated_drills.drill_data.players[].y. For each practice we average the y-coordinate of every player on the pitch (0 is the defending goal line, 100 is the attacking goal line) and bucket the result into Defensive (y<33), Middle (y=33-66), or Attacking (y>66).

Pitch zone is not a form field a coach fills in — it's derived from where the AI places players when it generates the practice diagram. When the coach's prompt doesn't mention a zone ("defensive third", "attacking third", "own-half", etc.), the AI tends to centre players around y=50 by default.

To make the chart honest we now record, at drill creation time, whether the coach's prompt explicitly named a zone. We run a conservative regex scan over the prompt and store the result ingenerated_drills.intent_signals. The chart caption shows both the total count and the explicit-coach count so a reader can tell how much of the "88% middle third" finding is coach intent versus AI default. (Spoiler: across the current dataset, only roughly 2% of pitch-third concentrations came from prompts that actually named a zone — almost everything is AI default placement.)

Opt-out

Any FootballGPT user who does not want their animated practices or queries to contribute to public aggregates can opt out by emailing [email protected]. An opt-out is honoured at the next weekly refresh and applied permanently going forward.

Caveats

This dataset reflects what coaches generate when using AI tooling. It is not a representative survey of grassroots football coaching as a whole — coaches who use AI tools may differ systematically from coaches who don't. Findings should be read as "what AI-using coaches ask for", not "what every grassroots coach plans". We've also recently improved our profile-collection in onboarding; older numbers reflect the pre-fix data, which under-counts certain age groups.

Some aggregates are also shaped by the AI's defaults. The pitch-third chart is the clearest example (see Chart 5 above), but anywhere the AI fills in a value the coach didn't specify, that default contributes to the aggregate. Where this materially shifts the read, we flag it on the chart itself. If you spot one we've missed, email [email protected] and we'll add the caveat.