I am planning to use MySQL version 4.1.22 on a Windows NT server. Most of my tables are InnoDB with numerical field. There are also some varchar (Variable Character Field) and longtext fields where English alpha-numerical text and most of the special characters on keyboard can be entered. During installation of MySQL, I decided on UTF-8 (8-bit UCS/Unicode Transformation Format) as my character set. But is this the ideal charset for my database? What should be the connection collation? the collation for varchar fields in the table? Is it utf8_general_ci? Also, I was wondering if I will be able to use tables with latin1_swedish_ci in the same database. Will using UTF-8 make the index larger in physical size (kBs)?
UTF-8 is ideal for most applications where any special characters must be supported. Set utf8_general_ci for both the connection and the field collations, unless you have a compelling reason to do otherwise.
MySQL will allow a user to choose the character set and collation down to the field level. You could have UTF-8 and latin fields in the same table if you wanted.
Since UTF-8 is a variable length character set, your index will only be larger than a similar Latin index if there are multi-byte characters used. If only single byte characters are stored in the table, than the index will be the same size.
The MySQL manual contains lots of information on the different character sets and collations available and how to convert between them.
This was first published in October 2007