NCD(1)
NAME
ncd - compute the Normalized Compression Distance
SYNOPSIS
ncd [ -c compressor ] [ -o filename ] [ -bcdhLnqsv ] [-o filestem ] [ -d|f|l|p|t string ] ... [arg1] [arg2]
DESCRIPTION
- The Normalized Compression Distance between two objects is defined as
- NCD(a,b) = (C(a,b) - min(C(a),C(b))) / max(C(a),C(b))
- where
- C(a,b) means "the compressed size of the concatenation of a and b"
- C(a) means "the compressed size of a"
- C(b) means "the compressed size of b"
- ncd will print a non-negative number (typically, but not always, 0 <= x < 1.1) representing how different the two objects are. Smaller numbers represent more similar files. The largest number is somewhere near 1. It is not exactly 1 due to imperfections in compression techniques or other irregularities underlying compressor, but for most standard compression algorithms you are unlikely to see a number above 1.1 in any case.
- Three compressors are available by default: bzlib, zlib and blocksort. These may be selected with an option in the complearn.conf, see complearn (5) for more details.
ENUMERATION MODES
- -f, --file-mode=FILE
- select file mode
- -l, --literal-mode=STRING
- select string literal mode; this is the default. The next argument is a string which, if containing white space, may be enclosed in double-quotes (")
- -p, --plainlist-mode=FILE
- select plain list mode; argument is a file which contains a list of files to be individually evaluated
- -t, --termlist-mode=FILE
- select term list mode; argument is a file which contains string literals to be individually evaluated
- -d, --directory-mode=DIR
- select directory mode; argument is a path which contains files to be individually evaluated
OPTIONS
- -c, --compressor=compressor
- use and set compressor to use
- -L, --list
- list available builtin compressors as well as available compression modules. Modules are loaded from the modules subdirectory of /usr/lib/complearn.
- -s, --size
- get, in place of NCD, the compressed size of a single FILE, STRING, or DIR
- -n, --nexus
- Nexus output format for distance matrix
- -o, --output=FILE
- specify binary output filestem, if different from distmatrix, the default. An extension (.clb, .nex, or .txt) will be added, as appropriate to the output file type.
- -b, --binary
- output results to binary file; the default name is distmatrix.clb
- -q, --quiet
- suppress ASCII output and messages
- -v, --verbose
- activate verbose mode
- -h, --help
- show help options and exit
FILES
$HOME/.complearn/complearn.conf
/usr/share/complearn/complearn.conf
- /usr/local/share/complearn/complearn.conf
- per-user and system configuration files
see complearn(5) for further details.
- $HOME/.complearn/modules
- /usr/lib/complearn/modules
standard module automatic loading area. Any shared object compressormodules found here will be loaded on startup.
ENVIRONMENT
- COMPLEARNMODPATH
- If this environment variable is set, CompLearn will search the
- given directory and load any CompLearn compression modules it finds there (such as the libart.so example included with the CompLearn source distribution) none
DIAGNOSTICS
none