Advice on MySQL character sets and collations

Advice on MySQL character sets and collations

I am planning to use MySQL version 4.1.22 on a Windows NT server. Most of my tables are InnoDB with numerical field. There are also some varchar (Variable Character Field) and longtext fields where English alpha-numerical text and most of the special characters on keyboard can be entered. During installation of MySQL, I decided on UTF-8 (8-bit UCS/Unicode Transformation Format) as my character set. But is this the ideal charset for my database? What should be the connection collation? the collation for varchar fields in the table? Is it utf8_general_ci? Also, I was wondering if I will be able to use tables with latin1_swedish_ci in the same database. Will using UTF-8 make the index larger in physical size (kBs)?

    Requires Free Membership to View

    When you register, my team of editors will also send you resources covering Linux administration and management; integration and interoperability between Linux, Windows and Unix; securing Linux and mixed-platform environments; and migrating to Linux.

    Cathleen A. Gagne, Senior Editorial Director

    By submitting your registration information to SearchEnterpriseLinux.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchEnterpriseLinux.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

UTF-8 is ideal for most applications where any special characters must be supported. Set utf8_general_ci for both the connection and the field collations, unless you have a compelling reason to do otherwise.

MySQL will allow a user to choose the character set and collation down to the field level. You could have UTF-8 and latin fields in the same table if you wanted.

Since UTF-8 is a variable length character set, your index will only be larger than a similar Latin index if there are multi-byte characters used. If only single byte characters are stored in the table, than the index will be the same size.

The MySQL manual contains lots of information on the different character sets and collations available and how to convert between them.

This was first published in October 2007