Modeling the Probability of a Soccer Match Outcome

Analyzing the outcomes of 13.337 Fifa official matches since 1994 involving the 2014 Fifa World Cup contenders, a very clear trend can be observed from the matches data.

Based on scraped data from Fifa.com, a strong correlation could be observed between the games locations (home or away), the teams positions in the Fifa Ranking (more precisely, the distance between their positions) and the proportion of matches won or lost.

For instance, of the 48 matches played at home where a team challenged an opponent 10 positions below in the ranking table:

  • 27 (56.25%) were won by the home team
  • 12 (25%) ended up with a draw
  • 9 (18.75%) were lost.
  • Considering half of the draws as positive results and half as negative, it sums up to 68.75% of positive results.

    Now, for the exact same ranking distance, of the 44 matches played at the opponents home where the opponent was 10 positions below in the ranking table:

  • 9 (20.45%) were won by the home team
  • 22 (50%) ended up with a draw
  • 13 (29.55%) were lost.
  • Considering half of the draws as positive results and half as negative, it sums up to 45.45% of positive results.

    Just to show a final example, of the 5 matches played at the opponents home where the opponent was 100 positions below in the ranking table, 100% of them were won by the away team. The matches for this case were:

  • (1)Brazil 3 x 0 Wales(101) - 05/23/2000 – Cardiff
  • (7)Spain 2 x 0 Latvia(107) - 06/02/2007 - Riga
  • (21)USA 1 x 0 Barbados(121) - 06/22/2008 - Bridgetown
  • (66)Cameroon 1 x 0 Guinea-Bissau(166) - 02/29/2012 - Bissau
  • (35)Algeria 1 x 0 Rwanda(135) - 06/16/2013 - Kigali


    The Process

  • Data Extraction:
    The data presented in the graph was scraped from www.fifa.com through two python scripts. That scripts automatically browsed hundreds of pages with matches’ results and ranking positions and expoted the data to csv and json files. It extracted data from every competition officially supported by Fifa, such as world cups, qualifying matches, friendly matches, beach soccer and futsal. More precisely, the scripts scraped 41.444 Fifa official matches scores since 1930 from 408 pages of data and 46.842 Fifa ranking positions for 215 different countries since 1993 from Fifa.com. For this work, we are only going considering world cups, qualifiers, confederations cups and friendly matches data.
  • Data Munging:
    The data scraped from fifa.com only contained the cities where the matches took place, without reference to the country or any other additional geographical information. To discover if the matches took place at the home of one of the contenders, the data was complemented with geographical location data from the Google Maps API. Geographical data from 1400 cities were requested to the Google API through a Python script and exported to CSV and JSON files.

  • Data Exploration/Graph Design:
    Since the volume of data isn't large (13.337 matches), the data was imported to an Microsoft Excel spreadsheet, where the data was munged and analyzed through pivot tables and formulas such as vlookups and countifs, used to calculate Ts ranking distance, proportion of wins/losses/draws and complement game location with country name. Then, MS Excel scatter plots were created to identify the features with better correlation to the proportion of wins. Then, the graphs designs were also developed in MS Excel until I found an exciting version to code.

  • Data export:
    The data required to draw the graph was then exported to a JSON format through excel text formatting formulas.

  • Graph coding:
    The graph was coded using javascript, D3.js and Sublime Text 3.


    The Fifa Ranking

    The ranking table is updated on a monthly basis, based on points earned from official matches results. The points earned depend on the opposing team ranking position, the importance of the match (whether it is a friendly match, a qualifying, confederations cup or world cup match) and the confederation of the opposing team. More information can be found here.

  • Get in touch

    Copyright (c) 2014 Andrew Yuan | Photos by Me | Design by CSS Templates.