Systeme D

29 November 2005

Captain Geowiki vs the Forces of Open Source OCR

Captain Geowiki has a 1953 Ordnance Survey Gazetteer of Great Britain. This comprises 145 closely-typed pages of placenames, each with an OSGB co-ordinate to 1000m accuracy. So Bisbrooke is SP8899, while Uppingham is SP8699.

It looks a bit like this:



Any free mapping project, such as the one co-founded by Captain Geowiki, would very much like to have a gazetteer like this.

However, in attempting to get it into computer-readable form, Captain Geowiki will have to engage in previously unfought battle with the Evil Forces of Open Source OCR.

Reading this bit will not help you at all

DarwinPorts is a great idea. By downloading and installing this wondrous piece of technology, Captain Geowiki can theoretically install all sorts of Unixy goodness on his Mac, without having to build anything from source ever again.

Captain Geowiki feels that there is a slight flaw with this marvellous, inspiring, groundbreaking system, which is that you have to build it from source.

Nevertheless, Captain Geowiki heroically struggles with the intricacies of ./configure and shit just so that he can install ocrad. Only to find that it won't install anyway (from port or from source) because, to summarise, the author is a GNU zealot.

Captain Geowiki will fearlessly join battle against Dutch CMS vendors, purveyors of very bad cider who admit the possibility that you could be both under and over 18, and performers of faintly unbelievable a capella versions of 1980s prog-pop hits, but he draws the line at GNU zealots.

Instead, Captain Geowiki decides to devote his considerable energies, and significantly less considerable competencies, towards gocr. Since this is available as a Fink package, and Captain Geowiki already has Fink installed, this should be a breeze.

However, that no-good lousy evil ratfink refuses to countenance installing 'unstable' packages for Captain Geowiki without self-updating, which involves recompiling Fink, a sufficiently slow process that - by the time it's finished - Captain Geowiki's superpowers have been thoroughly sapped, not so much by the sight of endless scrolling gcc output, but more by the pint of Thatchers Mendip Scrumpy he has poured for himself in a last-ditch attempt to retain the will to live.

Captain Geowiki vs the JOCR



At this point Captain Geowiki meets his most terrible foe yet - the JOCR. (Apparently GOCR was already taken as a project name on Sourceforge.)

GASP at the tense dialogue between these two feared protagonists:

  • Captain Geowiki: gocr -v 33 test.pbm
  • The JOCR: (null): 'allocationDepth' (0) is smaller than 'depth' (1)
  • Captain Geowiki: Er... cd something/or_other... gocr test.pbm
  • The JOCR: (null): image depth (2416116984) too large to be processed
  • Captain Geowiki: tits. gocr test.pgm
  • The JOCR: (null): image depth (2416116984) too large to be processed
  • Captain Geowiki: Fie! gocr test.pcx
  • The JOCR: 00123456789, ABCDEFGHIJKLMN, NopQRsTuvwxyz, abcdefghijklmn, nopqrstuvwxyz
  • Captain Geowiki: I have you!
  • Captain Geowiki: Prepare to meet your doom!
  • Captain Geowiki: gocr gazetteerscan.pcx
  • The JOCR: Mo1l_gton... ... SP 4347 Monymusk ... .'' ,.. NJ ',,68l_5 Morfa'Ne_?\code(0144)... _... SH, 284_,
    Mongour (mt.3 .,. NO 7589 (PICTURE)' Moorbog _ot_.3... SK 7566 Morston _... ...,, TG Oo43
    Mo__s R;sborou_ SP 8I% Morchard Road Morton Pa_ ... NZ 30!3
  • Captain Geowiki: nnnnggggh

Captain Geowiki Vanquishes the Evil Forces

Alerted by a slightly confused posting on oreillynet, Captain Geowiki bravely closes the Terminal, storms over to his Applications folder (with icons, and stuff), opens CanoScan Toolbox X, and scans a gazetteer page to a PDF.

Quickly checking over his shoulder for the presence of GNU zealots and anyone else who may disparage/disembowel him for the effrontery of using a decidedly closed source solution, he furtively double-clicks the resulting PDF, drags a box around the text, presses cmd-C to copy, and pastes the result into his blog.

  • Blackdyke ... NY 1451
    BlackEdge ... ... NY4288
    Black Esk NY 2193
    Black Fell (Cumbld.) NY 6444
    Black Fell (Northld.) NY 7073
    Blackfield SU 4402
    Blackford (Aberdeen.) NJ 7035
    Blackford (Cumbld.) NY 3962
    Blackford(Perth.) ... NN 8908
    Blacklbrd (Somerset) ST 4147
    Blackford (Somerset) ST 6526
    Blackfordby SK 3318
    Biackgang Chine ... SZ 4876
    Blackhall NT 2174
    Blackhall Castle ... NO 6695
    Biack Halls Rocks ,.. NZ 4739
    Blackham ... ... TQ 4839

Pausing only to type "my work here is done", Captain Geowiki walks off into the sunset, before realising that the sunset ended about seven hours ago and he has spent all the intervening time wrestling with waste-of-space open source OCR software.

Next week

Captain Geowiki does battle against the forces of evil Destination Management Organisations who clearly have a different definition of 'November 2005' to the rest of us.


Comments

Hello peoplebr32e881

Posted by free mp3 music downloads on 28.11.07 10:21

Great boysbre2e57eb47fa17d388a5180fc5db8f6be

Posted by free limewire download on 23.12.07 06:58

Hi, my sites:br618a4983bf4c5b98621c5809ebaa13fe

Posted by free mp3 on 24.12.07 20:26

Posted by of day forces announced Who creation armed the pacific on 17.5.08 20:33

Posted by creation armed announces announced forces of day the Who on 17.5.08 20:46

Posted by creation armed announces announced forces of day the Who on 17.5.08 20:58

Posted by -±-µ-· -°-½-‚-¸-¼-°-ƒ-»-½-µ-‚-¸-·-¼ on 22.6.08 05:25

Posted by allure amateur sample rachel on 27.6.08 09:51


Add a comment

Your name:

E-mail address: 

Comment:

your comment. (E-mail addresses will not be visible, but a server-based mail link will be provided.)