Vessel density map: and so it begins!
In June we left you with a promise: we’d make a vessel density map of EU waters and we’d share it on our portal for you to view and download. In the meantime, several things have happened. A meeting took place in Brussels in September and the discussion ensued enabled us to better define the requirements that the maps should have to be as useful as possible to the maritime community.
Now, we’ve just purchased a year worth of data from a commercial provider, and so we’re officially getting started. We’re very excited, but also feel sort of jittery; making the map is not going to be a piece of cake. For starters, the amount of data to process is huge. Suffice it to say that, for the area we’re analysing, a typical day contains around 12 million AIS messages. Multiply that by 365 and you’ll get something that goes under the name of… big data!
To make it even more complicated, before actually creating the map, the underlying data need to be pre-processed and cleaned. AIS messages are delivered in NMEA format. However, human beings and NMEA format don’t get along quite well, and so it is highly recommended to convert the data into a format that is easier to work with (e.g. CSV). Then, there’s the challenge of dealing with messages and ship positions that are obviously wrong. For instance, albeit a rare event, it is not impossible to have to deal with an AIS message according to which a ship is sailing across the Alps or in the middle of the Black Forest; errors always happen, and, when you deal with a billion records, even 0.5% of wrong messages quickly become a pain in the neck.
Ship DensityAnd there’s more to it: some messages might report implausible speed or course, or a wrong MMSI (a unique identifier that makes it possible to identify ships). All these messages need to be either corrected or deleted, but doing so on billions of records takes time, patience as well as ingenuity to create algorithms that may speed up the process. Other known issues are duplicated messages and satellite “noise” in areas characterised by high density.
To give you an idea, consider that when our colleagues at HELCOM did a similar exercise they had to use a dedicated server with a 10-core CPU and 48 GB of RAM to process the data. In case you aren’t aware, dedicated servers are essentially servers that a website has all to itself. They offer immense power and flexibility. This just shows how much processing this exercise requires.
As if this was not enough, we also need to define what we mean by ‘density’. Essentially, our map will be a grid with 1 square kilometre cells, each with a colour gradient that gives an idea of vessel density – generally, the darker the colour, the higher the average number of ships in a cell. But how do we calculate density? HELCOM recreates ship routes starting from AIS messages, and then counts the number of track lines that cross each cell. Another method could also take into account lines’ length, as number of track lines alone might be misleading. On the other hand, the JRC suggests counting ship positions (so points, not lines) in grid cells at fixed time intervals. At this point the discussion gets quite technical, and it’s worth a dedicated post. What we’re sure of is that each method produces different results, and so the final choice needs to be pondered carefully.
So, this post is just a brief overview of some of the challenges we’ll be facing over the coming months. Next posts will focus on other aspects that we haven’t discussed here. It’s going to be a long and exciting journey, and we plan to share it on this blog as we make progress.