This page (set up just so I could see how Blogger works) is a portal
to our adventures in food and drink, recipes, disasters, triumphs...

Please use the links below or go directly to the real blog from OUR kitchen

` ` `

22 September 2009

struggling with switching from latin1 to utf-8 in wordpress database tables


Sigh. Still fighting with trying to fix things on the etherwork blog (my real blog). So far I've had to restore the database twice. Thank goodness for backups!!

As a result of my experiments with sql, there are several stray Âs in the posts. After the initial upgrade, there were ûs instead of üs, és instead of és, âs instead of âs, etc. etc. too but I think (I hope) I dealt with them by using the wordpress "search and replace" plugin.

I have enabled comments for this post in the wild hope that someone knows the answer to how to change the wordpress database tables to be utf-8 - without causing explosions. Any realistic suggestions will be most welcome. (Please refrain from suggesting that I leave wordpress entirely; that would mean having to move the database to another program. EEEEEEeeeeeeeeeeeek!!)

I have two different character sets for the tables in the wordpress database: latin1_swedish_ci collation and utf-8 collation labels

Unfortunately, I am not comfortable to just go ahead and blindly change the tables to utf-8 (as if I know how!); I only know enough about SQL to get myself in trouble.

Alas, the wordpress "guide" is not written in layman's terms, except for this phrase in the opening paragraph: "character set conversion is not a simple process". And the official wordpress forums are pretty much useless because they are filled with people like me - the blind leading the blind. But luckily, googling revealed that someone has attempted a translation at

[I]n existing WordPress installations that are upgraded from earlier version of WordPress or not explicitly set a Unicode UTF-8 charset collation, the default database character set is normally set up as Latin1 (default on almost all MySQL installation) with latin1_swedish_ci collation. If you run a bilingual or multilingual blog with WordPress, you may face problem on character encoding when your blog posts are written in other foreign languages, or when you export and backup the database and later attempt to re-import the database dump in the event of database failure or server migration and moving. The symptom is obvious, your WordPress posts or pages will contains garbled, weird and funny characters, sometime just lots of ????? (question marks), rendering the WordPress database with your hard work useless and output unreadable. (May cause also by wrong charset collation)

This is EXACTLY what happened to me (thank goodness for copies of database backups kept in separate folders!)

In the Guide to Convert WordPress Database Character Set to UTF8, it says:

Ensure that your post_content and post title fields on wp_posts table DOES NOT belongs [sic] to any indexes or FULLTEXT indexes. Else the type of the fields may not be converted to BLOB with one of the errors list below. Some plugins, such as related posts tend to add indexes to these fields. In this case, temporarily drop the indexes.
I've asked the following question in forum, but the thread is dated 2007, so the question may be overlooked. I am hoping that someone here will help.

What exactly do I look for in phpMyAdmin to find out if fields belong to indexes or FULLTEXT indexes?

(I'm running WinXP, firefox, SQL 4.1.21-log, PHP 5.2.8, Apache 1.3.37 (Unix); screenshot of what I see in phpMyAdmin at (etherwork)

I am reluctant to simply dive in again and hope for the best, having already TWICE restored wordpress from a backup after the first disastrous result from running the wordpress plugin to Convert WordPress Database Character Set and the second most recent disaster.)