Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing character encoding in MySQL, PHP scripts, HTML

So, I have built on this system for quite some time, and it is currently outputting Latin1 (ISO-8859-1) to the web browser, and this is the components:

MySQL - all data is stored with the Latin1 character set

PHP - All PHP text files are stored on disk with Latin1 encoding

HTML - The output has the http-equiv="content-type" content="text/html; charset=iso-8859-1" meta tag

So, I'm trying to understand how the encoding of the different parts come into play in my workflow. If I open a PHP script and change its encoding within the text editor to UTF-8 and save it back to disk and reload the web browser, the text is all messed up - unless the text comes from the DB. If I change the encoding of the DB to UTF-8 and keep the PHP files in latin1 I have to use utf8_decode() for the data to display correctly. And if I change the HTML code the browser will read it incorrectly.

So yeah, I realise that if I want to "upgrade" to UTF8, I have to update all three parts of this setup for it to work correctly, but since it's a huge system with some 180k lines of PHP code and millions of posts in a lot of databases/tables, I don't want to start something like this without understanding everything correctly.

What haven't I thought about? What could mess this up beyond fixing? What are the procedures for changing the encoding of an entire MySQL installation and what's the easiest way to change the encoding of hundreds or thousands of PHP files on disk?

The META tag is luckily added dynamically, so I'll change that in one place only :)

Let me hear about your experiences with this.

like image 711
Sandman Avatar asked May 05 '26 02:05

Sandman


1 Answers

It's tricky.

You have to:

  • change the DB and every table character set/encoding – I don't know much about MySQL, but see here
  • set the client encoding to UTF-8 in PHP (SET NAMES UTF8) before the first query
  • change the meta tag and possible the Content-type header (note the Content-type header has precedence)
  • convert all the PHP files to UTF-8 w/out BOM – you can easily do that with a loop and iconv.
  • the trickiest of all: you have to change most of your string function calls. Than means mb_strlen instead of strlen, mb_substr instead of substr and $str[index], etc.
like image 69
Artefacto Avatar answered May 07 '26 18:05

Artefacto