Neoseeker : Blogs : Redemption : Neoseeker goes UTF-8!

Redeemer.Blog

Neoseeker goes UTF-8!

This post is mostly for historical purposes, and for those who are interested in this sort of tech babble.


As of this morning, Neoseeker and all its associated projects have been migrated to UTF-8. This was a long and arduous process that tomchu completed for us successfully. All told over some 72hrs of work was required by various people to finish this update which will largely go unnoticed and has no immediate feature impact on the site.

Background

This entire process started with initial research and testing I did many years ago when we determined that foreign characters, especially Japanese titles of games, were not stored properly on Neoseeker. At the time it was deemed not feasible to make a conversion due to lack of manpower and understanding of character sets in general. The world itself was not quite ready either and MySQL especially had only just started really supporting character collations in MySQL 4.1.

In Feb 2008 key zer reported a bug relating to Russian text being counted improperly. I thought it was something we could fix easily, but after 20hrs of research, testing, and various other work decided it was just too tough, and the risk of corruption of existing data too high.

In Feb 2009 I decided we would take this project up once again, spurred by the need to properly support collation of accented European characters in search features we will be introducing on the site. This time around, tomchu, our Server Dude was tasked with the majority of the R&D of the update.

It was decided that the entire site had to be disabled for this update to be done properly, and the data dumped completely into SQL dump files and reimported after some manipulation. This procedure is remarkably similar to the recommended procedure to upgrade major variants of MySQL (our database software) itself, so we decided to kill 2 birds with one stone.

And now, March 2009 we accomplish 2 of 3 main "core" updates to our site platform: support for UTF-8 and migrating to MySQL 5.

What now?

This was not an easy journey and we're not done yet - alot of code has to be tweaked still on TOP of what's already been done, and many UTF-8 quirks have surfaced, but now that we're up and running again we can look forward to the minor but definitely tangible benefits of supporting non-ASCII characters straight in the site.

Comments

  • 1 thumbs!
    bobbonew since Dec 2002 | Mar 19, 09
    Very intriguing. I definitely expect to go through a similar situation in a couple months; character sets are so damn hard to understand. Good upgrade; sucks though when many many hours are spent and only 1% of the userbase will realize it and 1% of them will understand how complex it was.
    Last edited by bobbonew :: Mar 19, 09
  • 0 thumbs!
    kik36 since Apr 2007 | Mar 19, 09
    WOW!!! A lot of effort put forth. Well done all involved! Bobbonew is right, most of us would never understand. Thanks for the update regardless, I find it fascinating to read.
  • 0 thumbs!
    Sabre since Aug 2007 | Mar 19, 09
    Ive been learning about Unicode types and character sets in my computing classes recently, and going deep into it just isn't really my cup of tea. Nice update regardless!
    Last edited by Sabre :: Mar 19, 09
  • 1 thumbs!
    tekmosis since Jul 2006 | Mar 20, 09
    私はこのアップデイトがすきです ^___________^
    • 0 thumbs!
      Xenctuary since May 2001 | Mar 24, 09
      Я люблю эта новая версия слишком!
      Last edited by Xenctuary :: Mar 24, 09
  • 0 thumbs!
    Redemption since Mar 2000 | Mar 20, 09
    It won't go as unnoticed as I might have claimed. The end result will be a subtle but consistent improvement in alot of things that the average user will not immediately take notice, but this update is quite necessary IMO.

    For instance, we will stop seeing Pokémon names being corrupted on the site. Before this update (and until we fix all associated bugs and quirks) Pokémon titles and any other products with non-ascii characters in them were semi-supported. Sometimes they work, and sometimes they don't. Now with the entire platform updated (minus areas we might have missed and bugs) theses characters should always consistently work. Which is kinda cool!
    • 0 thumbs!
      Dark Arcanine since Apr 2007 | Mar 20, 09
      I'm particularly happy about the bit involving Pokemon, a nice update overall.

      And woot for taking Japanese in the past so I can read the words of tekmosis. :3
      • 0 thumbs!
        Xenctuary since May 2001 | Mar 24, 09
        Haha, no more "PokÀ�man"!
        Last edited by Xenctuary :: Mar 24, 09
  • 0 thumbs!
    ShadowJ since Jan 2003 | Mar 21, 09
    So all in all, it will still look like a pile of gibberish then Red?
  • 0 thumbs!
    player300o since Nov 2005 | Mar 29, 09
    Hehehe, awesome. Actually being able to see everything correctly places this site higher up on my list.
Add your comment:
Name *:  Members, please LOGIN before posting
Email:
Live user
verification *:

Enter the letters you see in the image (without spaces)
Comment *:
(0.3483/d/aeon)