UTF8 all over the place

22 Jul 2011

,


For all web applications, I have to make sure I’m using UTF8. It’s not just for customers who want the occasional page in Japanese or Korean; it’s for perfectly standard English pages which use text such as Ætna or the non-Ascii pound sign £

Now, to get this right, I have to make sure the database is setup to handle UTF8 AND the web server is setup to handle UTF8 AND the browser is setup to handle UTF8…

Browser / Application

Declare the charset parameter in the HTTP headers

header('Content-Type:text/html; charset=UTF-8');

We can also set the HTML meta tag, althought this is really meant for cases where we don’t have any control over the HTTP headers.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Set the accept-charset attribute for forms.

<form accept-charset="utf-8" ...>

This makes it explicit to browsers that we expect UTF-8, and nothing else, and prevents guessing by the browser or relying on the document charset (which should be UTF-8 anyway). Belts and braces.

Apache

If the default character set for the server is set to “iso-8859-1” (or whatever) then this will override all the other settings (ouch!) and cause much confusion and head-scratching, so we need to make sure that that doesn’t go wooey on us.

Stick this in the .htaccess file.

AddDefaultCharset utf-8

MySQL Database

All the tables need to be setup to use utf8_general_ci collation, and the character fields in the tables should also have use utf8_general_ci collation.

You also need to make sure that the client connection is using the UTF8 character set to send SQL statements to the server. This is done by configuring the MySQL server settings.

default-character-set = utf8

Or, within the code, run this SQL query straight after connection:

SET NAMES UTF8;

And yes, most of this comes from Rob Allen’s site at <a href=”http://akrabat.com/2009/03/18/utf8-php-and-mysql/”>http://akrabat.com/2009/03/18/utf8-php-and-mysql/</a>  – I’m just getting it straight in my own mind.