10.17 What are the different row formats? Or, when should VARCHAR/CHAR be used?

MySQL dosen't have true SQL VARCHAR types.

Instead, MySQL has three different ways to store records and uses these to emulate VARCHAR.

If a table doesn't have any VARCHAR, BLOB or TEXT columns, a fixed row size is used. Otherwise a dynamic row size is used. CHAR and VARCHAR columns are treated identically from the application's point of view; both have trailing spaces removed when the columns are retrieved.

You can check the format used in a table with isamchk -d (-d means ``describe the table'').

MySQL has three different table formats: fixed-length, dynamic and compressed. These are compared below.

Fixed-length tables

  • This is the default format. It's used when the table contains no VARCHAR, BLOB or TEXT columns.
  • All CHAR, NUMERIC and DECIMAL columns are space-padded to the column width.
  • Very quick.
  • Easy to cache.
  • Easy to reconstruct after a crash, because records are located in fixed positions.
  • Doesn't have to be reorganized (with isamchk) unless a huge number of records are deleted and you want to return free disk space to the operating system.
  • Usually requires more disk space than dynamic tables.

Dynamic tables

  • This format is used if the table contains any VARCHAR, BLOB or TEXT columns.
  • All string columns are dynamic (except those with a length less than 4).
  • Each record is preceded by a bitmap indicating which columns are empty ('') for string columns, or zero for numeric columns (this isn't the same as columns containing NULL values). If a string column has a length of zero after removal of trailing spaces, or a numeric column has a value of zero, it is marked in the bit map and not saved to disk. Non-empty strings are saved as a length byte plus the string contents.
  • Usually takes much less disk space than fixed-length tables.
  • Each record uses only as much space as is required. If a record becomes larger, it is split into as many pieces as required. This results in record fragmentation.
  • If you update a row with information that extends the row length, the row will be fragmented. In this case, you may have to run isamchk -r from time to time to get better performance. Use isamchk -ei tbl_name for some statistics.
  • Not as easy to reconstruct after a crash, because a record may be fragmented into many pieces and a link (fragment) may be missing.
  • The expected row length for dynamic sized records is:
    3
    + (number of columns + 7) / 8
    + (number of char columns)
    + packed size of numeric columns
    + length of strings
    + (number of NULL columns + 7) / 8
    
    There is a penalty of 6 bytes for each link. A dynamic record is linked whenever an update causes an enlargement of the record. Each new link will be at least 20 bytes, so the next enlargement will probably go in the same link. If not, there will be another link. You may check how many links there are with isamchk -ed. All links may be removed with isamchk -r.

Compressed tables

  • A read-only table made with the pack_isam utility. All customers with extended MySQL email support are entitled to a copy of pack_isam for their internal usage.
  • The uncompress code exists in all MySQL distributions so that even customers who don't have pack_isam can read tables that were compressed with pack_isam (as long as the table was compressed on the same platform).
  • Takes very little disk space. Minimises disk usage.
  • Each record is compressed separately (very little access overhead). The header for a record is fixed (1-3 bytes) depending on the biggest record in the table. Each column is compressed differently. Some of the compression types are:
    • There is usually a different Huffman table for each column.
    • Suffix space compression.
    • Prefix space compression.
    • Numbers with value 0 are stored using 1 bit.
    • If values in an integer column have a small range, the column is stored using the smallest possible type. For example, a BIGINT column (8 bytes) may be stored as a TINYINT column (1 byte) if all values are in the range 0 to 255.
    • If a column has only a small set of possible values, the column type is converted to ENUM.
    • A column may use a combination of the above compressions.
  • Can handle fixed or dynamic length records, but not BLOB or TEXT columns.
  • Can be uncompressed with isamchk.

MySQL can support different index types, but the normal type is NISAM. This is a B-tree index and you can roughly calculate the size for the index file as (key_length+4)*0.67, summed over all keys. (This is for the worst case when all keys are inserted in sorted order.)

String indexes are space compressed. If the first index part is a string, it will also be prefix compressed. Space compression makes the index file smaller if the string column has a lot of trailing space or is a VARCHAR column that is not always used to the full length. Prefix compression helps if there are many strings with an identical prefix.