Systeme D

April 1, 2010

Hacking Ordnance Survey Meridian2 for beginners

Today, oh happy day, Ordnance Survey released a bunch of map data for free. It’s an attribution-only licence, so you can do whatever you like with it as long as you say “data thanks to the nice guys at Ordnance Survey”. You can make maps, write clever location-sensitive apps, all that sort of thing.

Isn’t that great? Hoorah for the Ordnance Survey; for the guys at the Department of Communities & Local Government and Cabinet Office; and for Tim Berners-Lee and Nigel Shadbolt who pushed all this through.

The data includes postcode data and simple raster streetmaps. But perhaps the most interesting, for now, is a dataset called Meridian2 – real vector data that you can parse. Let’s take a look at how you can use it.

For this tutorial, you’ll need a computer with Perl installed and a fair bit of Perl knowledge.

What you’ll need

First of all, download the Meridian2 data from the Ordnance Survey’s OpenData site. (At the time of writing, the OS site is fairly bombed out and you might be better off getting it from the mirror site kindly provided by MySociety.)

You’ll see that these have bizarre file extensions: .shp, .shx, .dbf. These are shapefiles, Ye Olde Paleocentric File Format invented by ESRI back in 1807. The Wikipedia article about shapefiles has all the nitty gritty and is worth reading.

You’ll need a Perl module to parse shapefiles. The module in question is Geo::ShapeFile. There are many ways of installing Perl modules, but I use the CPAN shell:

sudo perl -MCPAN -e shell
install Geo::ShapeFile
(happy message ensues)
quit

Parsing the data

So let’s fire up a text editor and write a Perl script to use this data.

We’ll start by calling up the Perl modules that we’ve just installed:

#!/usr/bin/perl -w

use Geo::ShapeFile::Point comp_includes_m => 0, comp_includes_z => 0;
use Geo::ShapeFile;

$fn=shift @ARGV;
$shp=new Geo::ShapeFile($fn);

Next, let’s take the filename of the shapefile, which we’ll supply on the command line when we call this script – so Perl puts it in the @ARGV array – and then load that shapefile.

$fn=shift @ARGV;
$shp=new Geo::ShapeFile($fn);

Easy so far. You’ll have seen that each Meridian2 shapefile is actually three files: something.shp, something.shx and something.dbf. .shp and .shx contain the geometry (the line of the road), while .dbf contains the metadata (the road name and number, and so on). Fortunately, we just need to supply the ’something’ to Geo::ShapeFile, and it adds the three extensions as it needs them.

The shapefile contains a long list of shapes. We’ll parse them one by one.

for (1..$shp->shapes()) {
$shape=$shp->get_shp_record($_);
%db   =$shp->get_dbf_record($_);
$type =$shp->type($shape->shape_type());

So what are $shape, %db and $type?

$type tells you what sort of shape we’re dealing with, and is a string which’ll typically be ‘PolyLine’ (a line made up of a list of points) or ‘Region’ (the same thing, but denoting an area). There are various other complicated types but they’re not found in Meridian2. In fact, Meridian2 makes your life easy by not mixing different shape types in each file.

%db is a hash of the metadata for this shape – the road name, number and so on. For example, a typical hash from the A-roads file is:

_deleted => ”,
INDICATOR => ”,
ROAD_NAME => ‘ROTHAY ROAD’,
METRES => 61,
OSODR => O16AU69Q9R0TW,
CODE => 3001,
NUMBER => ‘A593′

Pretty simple: to find the road number of this shape, all we need to do is look at $db{’NUMBER’}.

Reading a shape

So all we need to do now is parse the shape itself. A shape has several ‘parts’, and a part has several points. So first we iterate through the parts:

for ($i=0..$shape->num_parts()) {
@part=$shape->get_part($i);

Then we iterate through the points in that part:

foreach $point (@part) {
$x=$point->X();
$y=$point->Y();
# do something with X and Y, for example…
print “$db{’NUMBER’} goes through $x,$y\n”;
}

And really, that’s it. We’ve extracted the points in the line. Where I’ve put a comment saying “do something”, that’s where you come in. You could draw a line, colouring it differently based on the road number. You could compare it to the user’s location, and work out which one is closest. Or whatever. That’s the fun bit.

Close your loops, and we’re done.

Finally, you can run your script like this (adjusting the filename and paths accordingly):

perl meridian2.pl meridian2/data/a_road_polyline

and you might want to CTRL-C it as the entire A-road network of Britain scrolls by.

What data does Meridian2 offer?

First of all, there’s the roads, grouped by classification. The shapefiles are motorway_polyline, a_road_polyline, b_road_polyline, and minor_rd_polyline. Warning: the minor roads are big.

You’ve also got rivers (polylines), lakes (regions) and woodlands (regions); railways and coastlines (both polylines); county and district boundaries (regions, and pretty big); and dlua, which means ‘urban areas’ (another big set of regions). There’s a set of points, which are easier still to parse, covering stations, settlements, and roundabouts. Finally, there’s some presentation-only stuff (text and cartotext) which you shouldn’t need to worry about for raw hacking.

I’d suggest you start with motorway_polyline, which is a small, simple dataset, then move onto something bigger.

Another warning: the data is very broken up. You’ll find plenty of roads split into a zillion polylines with five points or maybe fewer. I did a bit of hacking to join them back together in my Perl script; depending on your needs you may want to do similar.

A note on co-ordinates

All Meridian2 X and Y co-ordinates are OS eastings and northings: in other words, metres east and north of an arbitrary point somewhere off the Isles of Scilly.

Eastings and northings are much nicer to work with than the lats and longs which the rest of the world uses. You don’t have to worry about complex projections, you can just scale them down to your screen. You can find the distance between two points using Pythagoras without any Great Circle nonsense. And so on.

On the other hand, none of the rest of the world uses them, for fairly obvious reasons.

So unless your project is UK-only, you’ll need another Perl module to convert between eastings/northings and lat/long. I’ve previously used Geo::Coordinates::OSGB, but Matthew Somerville points out in the comments (below) that MySociety’s code is more accurate.

Finally, here’s one I made earlier: meridian2.pl is a hacked-together Perl script to generate Adobe Illustrator files from Meridian2 shapefiles, while attributes.pl just outputs the attributes (%db) from a Meridian2 shapefile.

Enjoy! Questions welcome in the comments.


4 Responses to “Hacking Ordnance Survey Meridian2 for beginners”

  1. Matthew says:

    Just to clear up any potential confusion, lat/long can be different depending on the datum. Geo::Coordinates::OSGB by default converts to Airy1830 lat/long not WGS84 lat/long – you have to call another function to perform the datum shift. I’ve seen numerous cases of people translating E/N to lat/long, but not performing the datum shift and so being out by a few dozen metres.

    mySociety::GeoUtil at https://secure.mysociety.org/cvstrac/fileview?f=mysociety/perllib/mySociety/GeoUtil.pm does the full conversion in one step, plus works in the Irish National Grid too :)

    The results returned by mySociety::GeoUtil agree with those of nearby.org.uk and http://www.movable-type.co.uk/scripts/latlong-gridref.html , but Geo::Coordinates::OSGB appear to be 3 or so metres different (I used 400000,300000 to test and OSGB converted to (5dp) 52.59777,-2.00144, whereas everything else converts to 52.59779,-2.00143).

  2. Henry says:

    Also looking interesting is the landform Panorama data, which, if I understand correctly, is the contour info, among other useful tidbits. Wonderful!

  3. Richard says:

    Matthew – that’s really helpful, thank you. Will amend the posting to suit.

  4. scruss says:

    Richard, for transforming the shapefiles to geographic, the de facto tool is the OGR Simple Feature Library – http://www.gdal.org/ogr/
    Horrid command-line syntax, but works well and uses conversion data maintained by Serious People Who Know What They’re Doing.

    (hey, and I think we met in about 1989 or so at one of the computer entertainment trade shows in London. I was with Jeff Walker of WACCI/JAM, writing about Amiga games after being a CPC game reviewer …)

Leave a Reply