12.5 Le générateur de tables compressée

pack_isam is an extra utility that you get when you order more than 10 licenses or extended support. Since pack_isam is distributed only in binary form, pack_isam is available only on some platforms.

Of course, all future updates to pack_isam are included in the price. pack_isam may at some time be included as standard when we get some kind of turnover for MySQL.

pack_isam works by compressing each column in the table separately. The information needed to decompress columns is read into memory when the table is opened. This results in much better performance when accessing individual records, since you only have to uncompress exactly one record, not a much larger disk block like when using Stacker on MS-DOS. Usually, pack_isam packs the data file 40%-70%.

MySQL uses memory mapping (mmap()) on compressed tables and falls back to normal read/write file usage if mmap() doesn't work.

There are currently two limitations with pack_isam:

  • After packing, the table is read only.
  • It can't pack BLOB columns, yet.

Fixing these limitations is on our TODO list but with low priority.

pack_isam is invoked like this:

shell> pack_isam [options] filename ...

Each filename should be the name of an index (`.ISM') file. If you are not in the database directory, you should specify the pathname to the file. It is permissible to omit the `.ISM' extension.

pack_isam supports the following options:

-b, --backup
Make a backup of the table as nom_table.OLD.
-#, --debug=debug_options
Output debug log. The debug_options string often is 'd:t:o,filename'.
-f, --force
Force packing of the table even if it becomes bigger or if the temporary file exists. (pack_isam creates a temporary file named `nom_table.TMD' while it compresses the table. If you kill pack_isam, the `.TMD' file may not be deleted. Normally, pack_isam exits with an error if it finds that `nom_table.TMD' exists. With --force, pack_isam packs the table anyway.
-?, --help
Display a help message and exit.
-j big_nom_table, --join=big_nom_table
Join all tables named on the command line into a single table big_nom_table. All tables that are to be combined MUST be identical (same column names and types, same indexes, etc.)
-p #, --packlength=#
Specify the record length storage size, in bytes. The value should be 1, 2 or 3. (pack_isam stores all rows with length pointers of 1, 2 or 3 bytes. In most normal cases, pack_isam can determine the right length value before it begins packing the file, but it may notice during the packing process that it could have used a shorter length. In this case, pack_isam will print a note that the next time you pack the same file, you could use a shorter record length.)
-s, --silent
Silent mode. Write output only when errors occur.
-t, --test
Don't pack table, only test packing it.
-T dir_name, --tmp_dir=dir_name
Use the named directory as the location in which to write the temporary table.
-v, --verbose
Verbose mode. Write info about progress and packing result.
-V, --version
Display version information and exit.
-w, --wait
Wait and retry if table is in use. If the mysqld server was invoked with the --skip-locking option, it is not a good idea to invoke pack_isam if the table might be updated during the packing process.

The sequence of commands shown below illustrates a typical table compression session:

shell> ls -l station.*
-rw-rw-r--   1 monty    my         994128 Apr 17 19:00 station.ISD
-rw-rw-r--   1 monty    my          53248 Apr 17 19:00 station.ISM
-rw-rw-r--   1 monty    my           5767 Apr 17 19:00 station.frm

shell> isamchk -dvv station

ISAM file:     station
Isam-version:  2
Creation time: 1996-03-13 10:08:58
Recover time:  1997-02-02  3:06:43
Data records:              1192  Deleted blocks:              0
Datafile: Parts:           1192  Deleted data:                0
Datafile pointer (bytes):     2  Keyfile pointer (bytes):     2
Max datafile length:   54657023  Max keyfile length:   33554431
Recordlength:               834
Record format: Fixed length

table description:
Key Start Len Index   Type                       Root  Blocksize    Rec/key
1   2     4   unique  unsigned long              1024       1024          1
2   32    30  multip. text                      10240       1024          1

Field Start Length Type
1     1     1
2     2     4
3     6     4
4     10    1
5     11    20
6     31    1
7     32    30
8     62    35
9     97    35
10    132   35
11    167   4
12    171   16
13    187   35
14    222   4
15    226   16
16    242   20
17    262   20
18    282   20
19    302   30
20    332   4
21    336   4
22    340   1
23    341   8
24    349   8
25    357   8
26    365   2
27    367   2
28    369   4
29    373   4
30    377   1
31    378   2
32    380   8
33    388   4
34    392   4
35    396   4
36    400   4
37    404   1
38    405   4
39    409   4
40    413   4
41    417   4
42    421   4
43    425   4
44    429   20
45    449   30
46    479   1
47    480   1
48    481   79
49    560   79
50    639   79
51    718   79
52    797   8
53    805   1
54    806   1
55    807   20
56    827   4
57    831   4

shell> pack_isam station.ISM
Compressing station.ISM: (1192 records)
- Calculating statistics

normal:     20  empty-space:      16  empty-zero:        12  empty-fill:  11
pre-space:   0  end-space:        12  table-lookups:      5  zero:         7
Original trees:  57  After join: 17
- Compressing file
87.14%

shell> ls -l station.*
-rw-rw-r--   1 monty    my         127874 Apr 17 19:00 station.ISD
-rw-rw-r--   1 monty    my          55296 Apr 17 19:04 station.ISM
-rw-rw-r--   1 monty    my           5767 Apr 17 19:00 station.frm

shell> isamchk -dvv station

ISAM file:     station
Isam-version:  2
Creation time: 1996-03-13 10:08:58
Recover time:  1997-04-17 19:04:26
Data records:              1192  Deleted blocks:              0
Datafile: Parts:           1192  Deleted data:                0
Datafilepointer (bytes):      3  Keyfile pointer (bytes):     1
Max datafile length:   16777215  Max keyfile length:     131071
Recordlength:               834
Record format: Compressed

table description:
Key Start Len Index   Type                       Root  Blocksize    Rec/key
1   2     4   unique  unsigned long             10240       1024          1
2   32    30  multip. text                      54272       1024          1

Field Start Length Type                         Huff tree  Bits
1     1     1      constant                             1     0
2     2     4      zerofill(1)                          2     9
3     6     4      no zeros, zerofill(1)                2     9
4     10    1                                           3     9
5     11    20     table-lookup                         4     0
6     31    1                                           3     9
7     32    30     no endspace, not_always              5     9
8     62    35     no endspace, not_always, no empty    6     9
9     97    35     no empty                             7     9
10    132   35     no endspace, not_always, no empty    6     9
11    167   4      zerofill(1)                          2     9
12    171   16     no endspace, not_always, no empty    5     9
13    187   35     no endspace, not_always, no empty    6     9
14    222   4      zerofill(1)                          2     9
15    226   16     no endspace, not_always, no empty    5     9
16    242   20     no endspace, not_always              8     9
17    262   20     no endspace, no empty                8     9
18    282   20     no endspace, no empty                5     9
19    302   30     no endspace, no empty                6     9
20    332   4      always zero                          2     9
21    336   4      always zero                          2     9
22    340   1                                           3     9
23    341   8      table-lookup                         9     0
24    349   8      table-lookup                        10     0
25    357   8      always zero                          2     9
26    365   2                                           2     9
27    367   2      no zeros, zerofill(1)                2     9
28    369   4      no zeros, zerofill(1)                2     9
29    373   4      table-lookup                        11     0
30    377   1                                           3     9
31    378   2      no zeros, zerofill(1)                2     9
32    380   8      no zeros                             2     9
33    388   4      always zero                          2     9
34    392   4      table-lookup                        12     0
35    396   4      no zeros, zerofill(1)               13     9
36    400   4      no zeros, zerofill(1)                2     9
37    404   1                                           2     9
38    405   4      no zeros                             2     9
39    409   4      always zero                          2     9
40    413   4      no zeros                             2     9
41    417   4      always zero                          2     9
42    421   4      no zeros                             2     9
43    425   4      always zero                          2     9
44    429   20     no empty                             3     9
45    449   30     no empty                             3     9
46    479   1                                          14     4
47    480   1                                          14     4
48    481   79     no endspace, no empty               15     9
49    560   79     no empty                             2     9
50    639   79     no empty                             2     9
51    718   79     no endspace                         16     9
52    797   8      no empty                             2     9
53    805   1                                          17     1
54    806   1                                           3     9
55    807   20     no empty                             3     9
56    827   4      no zeros, zerofill(2)                2     9
57    831   4      no zeros, zerofill(1)                2     9

The information printed by pack_isam is described below:

normal
The number of columns for which no extra packing is used.
empty-space
The number of columns containing values that are only spaces; these will occupy 1 bit.
empty-zero
The number of columns containing values that are only binary 0's; these will occupy 1 bit.
empty-fill
The number of integer columns that don't occupy the full byte range of their type; these are changed to a smaller type (for example, an INTEGER column may be changed to MEDIUMINT).
pre-space
The number of decimal columns that are stored with leading space. In this case, each value will contain a count for the number of leading spaces.
end-space
The number of columns that have a lot of trailing space. In this case, each value will contain a count for the number of trailing spaces.
table-lookup
The column had only a small number of different values, and that were converted to an ENUM before Huffman compression.
zero
The number of columns for which all values are zero.
Original trees
The initial number of Huffman trees.
After join
The number of distinct Huffman trees left after joining trees to save some header space.

After a table has been compressed, isamchk -dvv prints additional information about each field:

Type
The field type may contain the following descriptors:
constant
All rows have the same value.
no endspace
Don't store endspace.
no endspace, not_always
Don't store endspace and don't do end space compression for all values.
no endspace, no empty
Don't store endspace. Don't store empty values.
table-lookup
The column was converted to an ENUM.
zerofill(n)
The most significant n bytes in the value are always 0 and are not stored.
no zeros
Don't store zeros.
always zero
0 values are stored in 1 bit.
Huff tree
The Huffman tree associated with the field
Bits
The number of bits used in the Huffman tree.