They say Australia is a sporting nation. One of the most interesting phenomena of this sporting obsession is the geographical divide in the popularity of the two biggest professional footbal codes, Australian Rules Football and Rugby League.
This divide has been dubbed “The Barassi Line”.
So, how accurate is this idea? It was first proposed back in 1978. Since then the AFL has been very actively expanding into New South Wales and Queensland, traditional Rugby League strongholds.
I decided to try and map out the popularity of each of the codes.
I have done this by plotting the location of local sporting teams, ie. the kind of team that an average person would play for on a weekend. I feel like this is a good indication of participation in each of the codes. It is fair to point out here that participation and supporting professional leagues as a spectator are not the same thing and may not have the same spatial distribution.
Each of the codes provides a handy website ( Australian Rules, Rugby League) that allows you to type in your postcode and display a list of the nearest local clubs. So, seemed like a prime opportunity to make my first attempt at web scrapping using my new ( and still very limited) python skills.
My first attempts were based on using Scrapy, which for various reasons, ended up going no where. After consultations with some very helpful mates and having another look at the html, we realised each search created a URL based on the postcode entered. In the html of the result, the list of clubs was given, formatted as a python dictionary, with Latitude, Longitude and the Name of the Club all provided.
A new plan was hatched. Using Beautiful Soup, I would create the search URL for every postcode in Australia and cut the list of clubs out of the html. Then its just a matter of sending that data to a csv, deleting the (many, many, many) duplicates and plotting them in QGIS.
So, I’m sure the way I’ve gone about isn’t necessarily the most efficient, but it worked in the end, I’ve provided the code I used for the Australian Rule website below.
from bs4 import BeautifulSoup import csv import urllib import re postlist =  count = 0 overall =  #Appends postcodes from csv to list so they can be iterated over in python with open ('postcodes.csv', 'rb') as csvfile: postcodes = csv.reader(csvfile, delimiter = ',') for row in postcodes: postlist.append(','.join(row)) #Loops through each postcode in list created above for i in postlist: newlist = '' cleanstr = '' paragraphs = '' #Creates search URL using postcode for this iteration URL = 'http://reg.sportingpulse.com/v6/MapFinder/mapfinder.cgi?if=1&amp;amp;r=2&amp;amp;sr=1&amp;amp;club_level_only=1&amp;amp;type=1&amp;amp;centre_search_type=2&amp;amp;search_value=' + i + '&amp;stt=pc' #Running Beautiful Soup on URL made above r = urllib.urlopen(URL).read() soup = BeautifulSoup(r, 'lxml') #Storing relevant section of resulting html data = soup.find_all("script") #Saves Beautiful Soup Result as string for x in data: paragraphs += str(x) #Trims above string p = re.search('var json_data = (.+?);\n', paragraphs) #Makes above string a list if p: newlist = p.group(0) #Takes list, makes properly formatted string if newlist: cleanstr = newlist[17:(len(newlist)-3)] #Turns string to tuple try: cleantup = eval(cleanstr) except Exception as e: print 'Error in PC:', i #Turns tuple into list cleanlst = list(cleantup) #Appends list to overall if proper data is contained if cleanlst != ['lat', 'lng', 'rank', 'name', 'letter']: overall += cleanlst else: pass count += 1 print count , i #Once all postcodes iterated, prints list of dictionaries to CSV with open('AFL_LONG.csv', 'a') as outfile: fp = csv.DictWriter(outfile, overall.keys(), restval = 'X', extrasaction = 'ignore') fp.writerows(overall)
Once I got all that sorted out, it was a pretty straightforward task. I deleted the duplicates in OpenOffice then plotted them in QGIS.
Before we have a look at the results I would just point out that this is not necessarily an exhaustive or necessarily accurate map of all clubs in Australia, clubs could be missing or misplaced due to omissions/inaccuracies from the websites, errors in my code or my handling.
That said, let’s have a look at the results.
Rugby League Clubs, Source: Rugby League
So we can see, lots of results all through urban and rural NSW and QLD. In all other states Rugby League is almost unrepresented outside the capitals.
Australian Rules Clubs, Source: Australian Rules
So we can see a few things going on here. Australian Rules is much more heavily represented in Western Australia, the Northern Territory, South Australia, Victoria and Tasmania. The difference between the two codes is especially evident in the rural areas.
We can also see that Australian Rules appears to be much more represented in Sydney and Brisbane, than Rugby League is in Perth, Adelaide, Melbourne and Hobart.
I’ve experimented with a few different ways of displaying these two sets for easiest comparison.
Red: Australian Rules Clubs, White: Rugby League Clubs
This is my personal favourite. I think having the data points moving across the frame instead of your eyes helps convey the broad pattern. The downside is that gifs can be a little buggy and slow to load.
Left: Australian Rules Clubs, Right: Rugby League Clubs
So another option I made is just side by side. I think this is still more readable than two separate images.
Up until now, the Barassi Line has been only loosely defined. So I decided to use this data I had gathered to calculate my own data driven model of the line. Because I am an egoist I have called it the BaRossi Line.
I chose a pretty simplistic method for calculating this, because of the limitations of the input data accuracy, the relatively large number of observations and large geographic spread.
To calculate the midpoint of each code I used the method outlined here: GeoMidPoint. Once I had those two points, I created the BaRossi Line by perpendicularly bisecting the line between these two points.
The BaRossi Line. Red: Centre of Australian Rules Clubs, White: Centre Rugby League Clubs and Orange: Midpoint.
Which I think compares pretty well to the representation of the representation of the line given on wikipedia given that its purely controlled by the data points.
The Barassi Line as shown on Wikipedia Source
So there we have it. You can definitely see the Rugby League vs Australia Rules divide by mapping local sporting clubs. Additionally, this data can be used to create a model of the Barassi Line that follows the original described in 1978 quite closely.