JSON + PHP + Form Arrays Gotcha

August 21st, 2008

Just a little note mostly to myself, but perhaps someone will find it and save themselves a couple of hours debugging:

I have a part of xemzi.com (search) where the page is loaded first, and then the GET parameters are passed via Ajax to the search routine, updating the central list area and the map in the process.

Normally, when searching from the main search box on the site, you can only choose one area to choose from, which I think is why it took me so long to notice this bug; once you've reached the search results page you can check various boxes in the left column to search different (and this time multiple) categories.

The Bug: if you selected more than one category to search in only the last one would register.

What had happened was I took $_GET and JSON'd it, which resulted in this:

{"module":"entity","search":"1","lang":"","ssquery":"vietnamese","subents":["venue","event"]}

all great so far, until I came to pass that object reference through Ajax.Updater as the new parameters for the search results, at which point I only received the last value ('event') of subents.

What had happened was that JS had removed the final '[]' on subents, so on the second GET it wasn't recognized by PHP as an "array form field." To rectify the situation I need to

params["subents[]"] = params["subents"]; params["subents"] = '';

Problem solved.

2 comments »

Latinizing Vietnamese Text

June 21st, 2007

This issue came up as I was working on adding SEF (search-engine friendly) words to internal URLs on www.newhanoian.com. Adding human-readable text to URLs is not only good for Google - I find it helpful for me when I'm looking at URLs on mouseover or in the Location Bar dropdown.

If I put the vietnamese words Bia Hơi - Bia Tươi into a SEF URL I get the following mess, however: bia-h%C6%A1i-bia-t%C6%B0%C6%A1i.

I don't know whether this is friendly to Google or not, but it's not easy on my eyes at all!

Luckily Vietnamese is still quite comprehensible when it's reduced to latin characters only, so I decided to transform the accented, dia-criticalized characters into their recognizable latin counterparts - ắ etc become a, đ becomes d, è becomes e, and so on. I was suprised to learn that with all the diacritic combinations, there are 186 possible character / diacritic combinations in written vietnamese!

My first stab at this was using iconv, unsuccessfully. The following code is supposed to automatically transliterate into latin: $word = iconv('UTF-8', 'US-ASCII//TRANSLIT', $word); -- it doesn't work well at all for VN text. It chokes totally on the bia example above.

After a little poking around on Viet Unicode and learning some interesting things about The Vietnamese alphabet and its collocation I managed to come up with the complete set of vn characters:

aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬbBcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆ
fFgGhHiIìÌỉỈĩĨíÍịỊjJkKlLmMnNoOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTu
UùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ
... and from this I came up with a set of regular expressions for latinizing vietnamese text, which I've expressed as a static utility method:
	public static function romanize_vn($string) {
		//a 
		$string = preg_replace('/[àảãáạăằẳẵắặâầẩẫấậ]/u', 'a', $string);
		$string = preg_replace('/[ÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬ]/u', 'A', $string);
		// e
		$string = preg_replace('/[èẻẽéẹêềểễếệ]/u', 'e', $string);
		$string = preg_replace('/[ÈẺẼÉẸÊỀỂỄẾỆ]/u', 'E', $string);
		// i
		$string = preg_replace('/[ìỉĩíị]/u', 'i', $string);
		$string = preg_replace('/[ÌỈĨÍỊ]/u', 'I', $string);
		// o
		$string = preg_replace('/[òỏõóọôồổỗốộơờởỡớợ]/u', 'o', $string);
		$string = preg_replace('/[ÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢ]/u', 'O', $string);
		// u
		$string = preg_replace('/[ùủũúụưừửữứự]/u', 'u', $string);
		$string = preg_replace('/[ÙỦŨÚỤƯỪỬỮỨỰ]/u', 'U', $string);
		// y
		$string = preg_replace('/[ỳỷỹýỵ]/u', 'y', $string);
		$string = preg_replace('/[ỲỶỸÝỴ]/u', 'y', $string);
		// d
		$string = preg_replace('/[đ]/u', 'd', $string);
		$string = preg_replace('/[Đ]/u', 'D', $string);
		return $string;
	}
... note the '/u' switch the pattern to tell PHP to interpret it as unicode. Otherwise you'll be matching the latinized garbage version of your unicode string.

So far it seems to work quite well. I'm sure all those preg's are not particularly fast, but they're not obviously slow either.

4 comments »

PHP5 Static Property Inheritance / Overloading Issue Resolved

May 17th, 2007

Here is the issue:

I need to use a static variable that changes with the sub class (db table, for example). It must be static because I need to use it in some static methods which don't know about $this (object context). The problem is that in php5, the 'self' keyword is always called in the context where it is written, so if you've got a method that's inherited, that's calling self::$something in the parent, you're going to get the parent value of $something. boo.

Static variables are also not available through $this->foo notation.



       abstract class a {
               public static $lala = '1';
               function __construct () {
                       echo self::$lala;
               }
       }

       class b extends a {
               public static $lala = '2';
       }

       new b();

This outputs '1'. Argh.

There's a bug filed for this behavior, but the php developers have sort of passed on it.

The workaround, I discover, is this:


 /* crazy workaround for the fact that 'self' does not follow inherited context */
private function table() {
       $vars = get_class_vars(get_class($this));
       return $vars['table'];
}

This replaces the static var with a non-static method so that it inherits nicely, and works around the brokenness of 'self'. Hmm. "Brokenness of self" sounds like a good title for some minimalist emo industrial track.

I hope this makes sense. It almost does to me. More to the point my code is now working, which makes me happy.

It's late, though, and it could be delirium.

0 comments »

Here's where it starts

This is my first note using mephisto. I have to say I'm very impressed with this system. Feels like a BMW - well-made, but quirky.

Tom By Tree