statist(1)
NAME
statist - calculate Huffman distribution for freeze(1)
SYNOPSIS
statist [ -gx... ]
DESCRIPTION
The default table is tuned for both C texts and executable files (as in
LHARC). If you will freeze any other files (natural language texts,
databases, images, fonts, etc.) you can calculate the matching positions distribution using the `statist' program, which calculates and
displays the mentioned distribution for the given file. It is useful
for large (100K or more) files.
Though the built-in position table is polyvalent, the tuning can
increase the compression rate up to one additional percent. (Observed
mainly on text files.)
USAGE
- statist [-g...] < sample_file
- or
- gensample | statist [-g...] where `gensample' is a program generating some sample stream of bytes similar to files to be frozen.
- The -g and -x switches have the same meaning as for freeze(1) and may be repeated.
- You can also see the intermediate values and watch their changes by pressing INTR key when you wish.
- Note: If you use gensample | statist , remember that INTR influence
BOTH processes !!
The results have the following format:
n1 n2 n3 n4 n5 n6 n7 n8 (uncertainty = x) Average match length: xx.yy
Percentile 99.9: p999
Percentile 99.5: p995
Percentile 99.0: p990
Percentile 97.0: p970
Percentile 95.0: p950
Percentile 90.0: p900
Percentile 80.0: p800
Percentile 70.0: p700
Percentile 50.0: p500
Sigma: xx.yy - Here n1 - n8 are values of the calculated position table elements, uncertainty is a number which denotes validity of given results (nonzero values of uncertainty indicate that the results may be unusable). Other values (average match length, percentiles and sigma) are FYI only.
- You may create the /etc/default/freeze file (if you don't like /etc/default/ directory, choose another - in MS-DOS it is FREEZE.CNF in the directory of FREEZE.EXE), which has the following format:
- name = n1 n2 n3 n4 n5 n6 n7 n8
- (name must start in column 1). For example:
---------- cut here ----------# This is freeze's defaults file
russian=0 0 1 2 6 20 31 2 # The sample was mailx.lp (Russian)
english=0 0 1 2 7 16 36 0 # The sample was gcc.lp (English)
# End of file
---------- cut here ---------- - If you find values, which are better THAN DEFAULT both for text (C programs) and binary (executable) files, please send them to me.
- Important note: statist.c is NOT a part of freeze package, it is an aditional feature.
SEE ALSO
DIAGNOSTICS
- Huffman tree has more than 8 levels, reducing...
- Self-explanatory, but sometimes reducing falls into infinite loop.
- xxxK
- Progress indicator is written after each 4K of a file processed.
BUGS
Sometimes use of the results with uncertainty = 1 (on a file) gives
compression rate worse than default but use of the results with uncertainty = 13 (on other file) works quite good.
- Found bugs descriptions, incompatibilities, etc. please send to
leo@s514.ipmce.su.