Systeme D

November 18, 2009

Ordnance Survey goes free – some initial thoughts

How about that, then? Or as the Map Room succinctly put it, “Holy shit.”

Good news for:

  • Google, Yahoo, Microsoft. Free maps, and unlike the US, good-quality free maps which they can start using right out the box.
  • Ordnance Survey. I wrote here previously that OS’s best chance of surviving was to open up street name/geometries, boundaries, postcodes, peaks, rivers and PROWs, and to keep charging for the large-scale stuff. This seems to be pretty much what’s promised. I still believe that it’s absolutely the right decision for them. (Also, I am rather smug.)
  • The Guardian. Launching a campaign is a risky business for any publication, especially a fairly obscure and, at times, seemingly fruitless campaign like ‘Free Our Data’. It has paid off – and of all the organisations campaigning for this, the Guardian is the only one that anyone has ever heard of.
  • Apple et al. Insofar as Apple ever gives a shit about anything that happens outside the US, they no longer have to depend on anyone for UK iPhone maps. Not Google, not Tele Atlas. No-one. (Incidentally, if UK mobile carriers had any brains, they would now write their own mapping app and bundle it with their iPhone contracts. Fortunately they don’t.)
  • Cartographers. Maps will now compete on cartography, not on data. This is an absolute shot in the arm for skilled cartographers and could go a long way to reviving the craft in the UK. With my Waterways World hat on, I’m delighted: our cruising guide maps can get better than they are now, yet anyone wanting to compete still has to learn how to produce lovely maps.
  • Developers. Same applies. I am really looking forward to what people come up with. If I were an iPhone dev I would start writing that killer app now, ready to release when the data arrives.
  • Wider Government. Full release instantly becomes the standard for public data. There is now absolutely no excuse for, say, the Environment Agency to withhold its fisheries data. That means more third-party sites that do funky things with public data. I suspect that will help in breaking the stranglehold of evil big outsourcers on Government IT projects.
  • This blog because I can stop writing about boring map copyright law and start writing about fun things, like canals, organs and the new William Orbit album.

Possibly good news for:

  • OpenStreetMap. I don’t think it’s a stretch to say this wouldn’t have happened without OSM. The inevitability that OSM would, in time, catch up with OS small-scale mapping absolutely vindicates the project. And, hey, complete data for the whole UK – what could be cooler?
        But on the other hand, everyone else has it, too. How do ongoing changes get integrated into the OSM database? Will the UK community survive a sudden change in tack from surveying the basemap to becoming a provider of ‘added value’? Will smaller public domain mapping projects create an informal, developer-led community without OSM’s harsh share-alike restriction? Will UK OSM developers (who lead the project) get bored of it now there’s not such a unique need? How many questions can I get in one paragraph?
        Oh, and there’s the licence. I dread to think what would happen if the chosen licence wasn’t compatible with OSM.

Bad news for:

  • Tele Atlas and Navteq. See G-Y-M above. On the up side, their parent companies no longer have to bother collecting UK data for their satnavs/mobile phones. But that’s like saying Tesco giving free food away is good news for Sainsbury’s, which can now take it and resell it for 1p.

November 4, 2009

The mysterious data mines of Argleton-on-Google

There’s been a bunch of online chatter today about Argleton, the mystery town on Google Maps that has never really existed.

Picture 3

“Maybe it’s a trap street,” people have speculated. Google itself appears to be pinning the blame on Tele Atlas, telling the Telegraph: “People can report an issue to the data provider directly and this will be updated at a later date.”

The Telegraph goes on to say: “The data for the programme was provided by Dutch company Tele Atlas. A spokesman said it would now wipe the non-existent town from the map.”

Update: Originally I suggested here that, by reference to extra map data showing up elsewhere in Britain, this looked like something that had been ‘mined’ by Google from web sources. From a couple of comments below based on other Tele Atlas mapping, it does actually appear that this is a superfluous Tele Atlas town, not an invention of Google’s data mining. Nonetheless the data mining story is interesting in itself, so…

The canary starts to wobble

We know that, even before their recent go-it-alone expedition in the States, Google was mining the web and integrating the results into its map data. Wikipedia is the best-known example; Wikipedia articles with co-ordinates have long appeared as ‘active POIs’ on Google Maps. But as time goes on, Google has mined more and more directories, and other web content, to make the maps richer than the raw Tele Atlas data can offer.

It’s a really clever idea.

But sometimes, the parsing fails. Google Maps FAIL has a good example. Google has found a source of addresses somewhere on the web, and pulled out various data from it. But either the source data is dodgy, or more likely, it’s not formatted quite as consistently as Google’s algorithms would like.

So in Google Maps FAIL’s example, the sizeable town of Cirencester has moved to a little village halfway towards Northleach “inhabited by two sheep and a squirrel”, and the historic city of Gloucester has navigated upriver 20 miles and is sitting in a watermeadow outside Tewkesbury.

This [edit] was my original guess as to what’s happened at Argleton: dodgy data mining. My guess was that the mined data was in fact a badly OCRed address, meant to be “Aughton” but transcribed as “Argleton”. We already know that Google is OCRing PDFs as it crawls them; or maybe it was OCRed before being uploaded to the web. No matter.

Picture 2

If we need any more proof that they’re mining some fairly imperfect sources, then three miles to the west we find “Downhollnad”. A couple of months ago I was drawing a map of the Leeds & Liverpool Canal there, and I’m pretty sure that it’s called Downholland. It’s spelled correctly on other Tele Atlas-derived mapping, too, such as Multimap’s.

Picture 1

The canary falls over

How endemic is this faulty mining?

My home-town of Charlbury is well-known as the world centre of innovation in collaborative mapping, especially as performed by ninjas. I was just coming back from church the other day and I met that Artem ‘Mapnik’ Pavlenko walking down the street. So let’s have a look at the data Google has mined for Charlbury.

Picture 5

This is a good start. St Mary’s, where I play the organ (badly), is labelled as ‘Charlbury RC Church’. St Mary’s is not Roman Catholic. It’s Church of England. People have been firebombed in Ulster for less. Charlbury’s Catholic church is, as the full address suggests, a few streets away on Fisher’s Lane. (Incidentally, thank you to my Twitter followers for suggesting that maybe RC meant Radio-Controlled. It could make baptisms a whole lot more fun.)

You can also see that the Bell is in the right place, but the Bull, which should be at the corner, is closer to where the Three Horseshoes actually is.

Picture 6

(Incidentally, there’s a little sponsored link beside the wee Bull for Millie Benjamin Bridal Wear. Curiously, when I looked at this earlier, this in turn triggered a foot-of-page ad for ‘Milly Dress at Shopbop’. So buying one sponsored ad alerts Google to place potential competitors’ ads at the same place? That’s an… interesting loyalty tactic.)

Picture 4

The ‘Cotswold View’ campsite has been placed on a little unpaved street called Cotswold View. As the full address again makes clear, it’s not there. It’s actually on the road to Enstone. Whether it’s actually on ‘Enstone Rd’ is debatable – I’d have said Banbury Hill, and so does Tele Atlas.

Note the non-standard space in the middle of the phone number. A Google search for “Cotswold View” “Enstone Road” “810 314” only returns a few results, two of which are at 192.com (once described as Britain’s most invasive website in a shock-horror exposé, and no strangers to data mining themselves). I’m guessing that Google is either mining 192.com or has licensed the same data.

This is also interesting in that Google clearly aren’t doing a postcode lookup, which would be easy technically but horrible legally. A postcode lookup would put the icon in the right place.

Picture 7

The Fiveways Takeaway appears on the wrong side of the road. Well, big deal. But again, the only result for Fiveways “Sturt Road” “811 555″ is 192.com.

(Curious decision on Google’s part not to show ‘Takeaway’ as part of the name, but yet also not to use a custom icon. Fiveways is originally the name of the junction you see just to the south-west. “Turn left at Fiveways” is a common direction in Charlbury. If you took that literally while looking at this map, you’d drive up Sturt Close.)

Picture 8

This one just made me giggle. The problem with having good satellite imagery, as again Google Maps FAIL points out, is that it shows up the inadequacies of the rest of your data. There is clearly a bowls club in this picture but it ain’t where the icon is.

This is a dead canary

So. A small Oxfordshire town, only a handful of mined icons, and around half of them are faulty in some way. Data is being conflated which shouldn’t be (’Cotswold View’ caravan site on ‘Cotswold View’ street, ‘Charlbury RC Church’ located at a church in Charlbury). Positional accuracy is iffy, at best. How endemic is this faulty mining? It’s pretty endemic.

Even getting to this stage is, of course, a display of awesome technical ability. And there is no doubt that the logic will iterate like every other Google product, becoming more accurate each time.

But it does also point out the limitations of applying search-engine technologies to mapping. If you search Google for something non-trivial, you don’t expect the top result to be the one that answers your question. You hope you’ll find it in the top 10, and if not, you’ll turn the page until you get the answer. It’s fuzzy like that and people accept this.

Map data isn’t fuzzy. You have to get it right, first time. Charlbury Bowls Club’s location is approximate, but nonetheless, wrong. St Mary’s is a church in Charlbury but it’s not the Charlbury RC Church.

Data mining gets you worldwide coverage fast, but takes a long time to get to 95% accuracy: you could argue it never will. Crowdsourcing, OpenStreetMap-style, gets you to 95% accuracy fast, but takes a long time to approach worldwide coverage. Professional surveying a la Tele Atlas gets you both, at a huge cost.

All of this is especially interesting in the light of the superb Mike Dobson interview at SearchEngineLand. If you only read one article about webmapping this year, make it that one. He’s the only commentator I’ve seen who appreciates how much data mining Google is doing:

“It is clear to me that conflation and data mining across redundant sources are major components of [Google’s] update process.”

He then suggests that the strategy is to start with data mining, then refine it via crowdsourcing.

“One of the tenets of crowd sourcing is that the frequency of errors decreases with increased inspection. So, Google might make a wrong change from time to time, but the odds are that someone will correct it.” [See also his later comment on Tele Atlas and GDT.]

In other words, Google’s strategy is to get worldwide coverage via mining, then refine it until it’s accurate by crowdsourcing. That makes a lot of sense. But it remains to be seen whether their reputation can withstand the Telegraph story that will inevitably accompany each excursion into the mines.

Drums. Drums in the deep. They Are Coming.