Yesterday, after uploading a refreshed www.sicirec.org, some character encoding issues popped up because I had converted the website’s content from ISO-8859-1 (Latin 1) to UTF-8. (I wanted to be able to type and paste special characters from PuTTY into VIM without worrying about the particular encoding of each file.)

The Apache HTTPD at InitFour, our webhosting provider, is configured to send ISO-8859-1 by default, while the one on our test server is configured for UTF-8. This caused a little bit of a surprise when I uploaded the refreshed website and saw all characters outside the ASCII range mangled on the life website!

I quickly dug into my .htaccess file to add the AddCharset utf-8 .xhtml directive. To my surprise, this didn’t do squat. A lot of fiddling, reloading and researching later, I realized that the following section in my .htaccess file rendered the AddCharset directive irrelevant:

<Files *.xhtml>
ForceType text/html
</Files>

I had to change the ForceType directive to include the charset as a MIME parameter:

<Files *.xhtml>
ForceType 'text/html; charset=UTF-8'
</Files>

Now, it all seemed to work. (Except that it didn’t really because I do some ridiculously complex content negotiation stuff involving a 406 handler in PHP that virtuals the most appropriate variant when no match is found. This script didn’t send a useful Content-Type header. After first adding it to the script, I noticed that the AddDefaultCharset is actually allowed in .htaccess context—a discovery which luckily rendered the other hacks useless.)