9.1 What languages are supported by MySQL?

mysqld can issue error messages in the following languages: Czech, Dutch, English (the default), Estonia, French, German, Hungarian, Italian, Norwegian, Norwegian-ny, Polish, Portuguese, Spanish and Swedish.

To start mysqld with a particular language, use either the --language=lang or -L lang options. For example:

shell> mysqld --language=swedish

or:

shell> mysqld --language=/usr/local/share/swedish

Note that all language names are specified in lowercase.

The language files are located (by default) in `mysql_base_dir/share/LANGUAGE/'.

To update the error message file, you should edit the `errmsg.txt' file and execute the following command to generate the `errmsg.sys' file:

shell> comp_err errmsg.txt errmsg.sys

If you upgrade to a newer version of MySQL, remember to repeat your changes with the new `errmsg.txt' file.

9.1.1 The character set used for data and sorting

By default, MySQL uses the ISO-8859-1 (Latin1) character set. This is the character set used in the USA and western Europe.

The character set determines what characters are allowed in names and how things are sorted by the ORDER BY and GROUP BY clauses of the SELECT statement.

You can change the character set at compile time by using the --with-charset=charset option to configure. 4.7.1 Quick installation overview.

To add another character set to MySQL, use the following procedure:

9.1.2 Adding a new character set

  1. Choose a name for the character set, denoted MYSET below.
  2. Create the file `strings/ctype-MYSET.c' in the MySQL source distribution.
  3. Look at one of the existing `ctype-*.c' files to see what needs to be defined. Note that the arrays in your file must have names like ctype_MYSET, to_lower_MYSET and so on. to_lower[] and to_upper[] are simple arrays that hold the lowercase and uppercase characters corresponding to each member of the character set. For example:
    to_lower['A'] should contain 'a'
    to_upper['a'] should contain 'A'
    
    sort_order[] is a map indicating how characters should be ordered for comparison and sorting purposes. For many character sets, this is the same as to_upper[] (which means sorting will be case insensitive). MySQL will sort characters based on the value of sort_order[character]. ctype[] is an array of bit values, with one element for one character. (Note that to_lower[], to_upper[] and sort_order[] are indexed by character value, but ctype[] is indexed by character value + 1. This is an old legacy to be able to handle EOF.) You can find the following bitmask definitions in `m_ctype.h':
    #define _U      01      /* Upper case */
    #define _L      02      /* Lower case */
    #define _N      04      /* Numeral (digit) */
    #define _S      010     /* Spacing character */
    #define _P      020     /* Punctuation */
    #define _C      040     /* Control character */
    #define _B      0100    /* Blank */
    #define _X      0200    /* heXadecimal digit */
    
    The ctype[] entry for each character should be the union of the applicable bitmask values that describe the character. For example, 'A' is an uppercase character (_U) as well as a hexadecimal digit (_X), so ctype['A'+1] should contain the value:
    _U + _X = 01 + 0200 = 0201
    
  4. Add a unique number for your character set to `include/m_ctype.h.in'.
  5. Add the character set name to the CHARSETS_AVAILABLE list in configure.in.
  6. Reconfigure, recompile and test.

9.1.3 Multi-byte character support

If you are creating a multi-byte character set, you can use the _MB macros. In `include/m_ctype.h.in', add:

#define MY_CHARSET_MYSET  X
#if MY_CHARSET_CURRENT == MY_CHARSET_MYSET
#define USE_MB
#define USE_MB_IDENT
#define ismbchar(p, end)  (...)
#define ismbhead(c)       (...)
#define mbcharlen(c)      (...)
#define MBMAXLEN          N
#endif

Where:

MY_CHARSET_MYSET A unique character set value.
USE_MB This character set has multi-byte characters, handled by ismbhead() and mbcharlen()
USE_MB_IDENT (optional) If defined, you can use table and column names that use multi-byte characters
ismbchar(p, e) return 0 if p is not a multi-byte character string, or the size of the character (in bytes) if it is. p and e point to the beginning and end of the string. Check from (char*)p to (char*)e-1.
ismbhead(c) True if c is the first character of a multi-byte character string
mbcharlen(c) Size of a multi-byte character string if c is the first character of such a string
MBMAXLEN Size in bytes of the largest character in the set