Systeme D

August 25, 2009

Fixing FP-40 – Unicode/international textField input in Linux Flash Player

For a long time, the most infamous bug in Potlatch wasn’t actually a bug in Potlatch: it was a bug in Linux Flash Player.

Linux FP doesn’t let you input international characters (i.e. anything apart from ASCII 32-127) into a textfield… and hasn’t done for a long time. So all those overseas mappers who want to type their street name – can’t. Epic fail is, clearly, epic.

Here’s the Adobe bug report for it. Feel the anguish in those comments.

And here’s how you fix it, I think. Many, many thanks to Linux-using OpenStreetMappers for their reports from my little debugging script, and especially to Ævar Arnfjörð Bjarmason who obviously has a vested interest in it working. I use a Mac so can only go on what others have said!

This code is in Ming-friendly ActionScript 1, but should be trivial to convert to 2 or 3. Firstly, define a hash to map between UTF-8 and CP1252 characters:

// UTF8 character remappings
var cp1252=new Object();
cp1252[0x20ac]=0x80;
cp1252[0x201a]=0x82;
cp1252[0x0192]=0x83;
cp1252[0x201e]=0x84;
cp1252[0x2026]=0x85;
cp1252[0x2020]=0x86;
cp1252[0x2021]=0x87;
cp1252[0x02c6]=0x88;
cp1252[0x2030]=0x89;
cp1252[0x0160]=0x8a;
cp1252[0x2039]=0x8b;
cp1252[0x0152]=0x8c;
cp1252[0x017d]=0x8e;
cp1252[0x2018]=0x91;
cp1252[0x2019]=0x92;
cp1252[0x201c]=0x93;
cp1252[0x201d]=0x94;
cp1252[0x2022]=0x95;
cp1252[0x2013]=0x96;
cp1252[0x2014]=0x97;
cp1252[0x02dc]=0x98;
cp1252[0x2122]=0x99;
cp1252[0x0161]=0x9a;
cp1252[0x203a]=0x9b;
cp1252[0x0153]=0x9c;
cp1252[0x017e]=0x9e;
cp1252[0x0178]=0x9f;

Then for every textField, make sure that it has an onChanged event like this:

chat.onChanged=function(tf) { fixUTF8(); };

(You can, of course, add more stuff to the onChanged function after the call to fixUTF8.)

Finally, add this function:

function fixUTF8() {
	if (System.capabilities.os.indexOf('Linux')==-1) { return; }

	var s=eval(Selection.getFocus()); var t=s.text;
	var i=Selection.getCaretIndex()-1; var d=i;
	while (((t.charCodeAt(i)>=0x80 && t.charCodeAt(i)<=0xBF) || (cp1252[t.charCodeAt(i)])) && i>0) {
		if (cp1252[t.charCodeAt(i)]) {
			t=t.substr(0,i)+String.fromCharCode(cp1252[t.charCodeAt(i)])+t.substr(i+1);
		}
		i--;
	}
	if (i==d) { return; }

	var u=0;
	if (t.charCodeAt(i)>=0xC2 && t.charCodeAt(i)<=0xDF && d-i==1) {
		// two-byte sequence
		u= (t.charCodeAt(i+1) & 0x3F)       +
		  ((t.charCodeAt(i  ) & 3   ) << 6) +
		  ((t.charCodeAt(i  ) & 0x1C) << 6);
	} else if (t.charCodeAt(i)>=0xE0 && t.charCodeAt(i)<=0xEF && d-i==2) {
		// three-byte sequence
		// (Flash Player doesn't cope with any more obscure Unicode)
		u= (t.charCodeAt(i+2) & 0x3F)        +
		  ((t.charCodeAt(i+1) & 3   ) << 6 ) +
		  ((t.charCodeAt(i+1) & 0x3C) << 6 ) +
		  ((t.charCodeAt(i  ) & 0x0F) << 12);
	}

	if (u!=0) {
		s.text=t.slice(0,i)+String.fromCharCode(u)+t.slice(d+1);
		s.text=s.text.split(String.fromCharCode(0x03)).join('');
	}
} 

Does it work? Yes – mostly. The one report of failure that I’ve had is that if Flash Player is under high load (e.g. downloading a lot of data), or the user is typing particularly quickly, then it appears that the faulty double-encoded characters can get jumbled, and therefore fixUTF8 won’t be able to parse them in the expected order. It might be possible to address this by intercepting key events, rather than parsing the textField, but this is likely to be a big ball of hurt if the user is doing anything other than adding characters to the end of the field. I’m tempted to leave it as it is.


9 Responses to “Fixing FP-40 – Unicode/international textField input in Linux Flash Player”

  1. Elias Aarnio says:

    Thank you for this! Let’s see how we are able to use this in different services that use Flash. I guess Open Source people will be quite keen on adapting this as this has been a problem for a long time.

  2. hi,

    we try to apply your patch but somehow it does not work.

    The problem is: The function does not find the special chars.
    The problem is in this first while-loop

    ((t.charCodeAt(i)>=0×80 && t.charCodeAt(i) compare those two x: x and × => any reason why you use the second one? That is not even a valid char represent.

  3. hi,

    we try to apply your patch but somehow it does not work.

    The problem is: The function does not find the special chars.
    The problem is in this first while-loop

    ((t.charCodeAt(i)>=0×80 && t.charCodeAt(i) check the seconds X … its not event a valid char represent

  4. Richard says:

    The weird “x” is a bug in WordPress. It should of course be a normal x.

  5. thanks it seems to work now. If you try to migrate to a OpenLaszlo one day you might can re-use that ^^
    http://openmeetings.googlecode.com/svn/trunk/openmeetings_lps411/base/components/text/test-customEdittext.lzx

  6. [...] Existuje niekoľko relatívne použiteľných workaroundov. Ale to, čo potrebujú flexový vývojári funčené riešenie, a nie [...]

  7. Gunstick says:

    Interesting is that if you cut/paste the characters, it’s actually working. Just typing directly is not.

  8. Anderson Koester says:

    Thanks, now I need to converto to AS2… New problem… ahaha!!

    Congratulations, workaround 100%!

Leave a Reply