hocr2djvused(1)

NAME

hocr2djvused - hOCR to djvused script converter

SYNOPSIS

hocr2djvused [option...]

DESCRIPTION

hocr2djvused reads a hOCR[1] file (as produced by OCRopus[2] or Cuneiform[3]) from the standard input and converts it to a djvused script.

OPTIONS

Text segmentation options: -t lines, --details lines
Record location of every line. Don't record locations of particular words or characters.; -t words, --details=words
Record location of every line and every word. Don't record
locations of particular characters.

This is the default.; -t chars, --details=chars
Record location of every line, every word and every character.; --word-segmentation=simple
Consider each non-empty sequence of non-whitespace characters a
single word.

This is the default, despite being linguistically incorrect.; --word-segmentation=uax29
Use the Unicode Text Segmentation[4] algorithm to break lines into words.

This options break assumptions of some DjVu tools that words are
separated by spaces, and therefore is it not recommended.
Other options: --rotation=n
Assume that DjVu pages are rotated by n degrees.; --page-size=widthxheight
Specifies that page size is width pixels x height pixels.

This option is required for hOCR generated by Cuneiform and
superfluous otherwise.; --version
Output version information and exit.; -h, --help
Display help and exit.

AUTHOR

Jakub Wilk <ubanus@users.sf.net>: Author.

COPYRIGHT

NOTES

1. hOCR: http://docs.google.com/View?docid=dfxcv4vc_67g844kf; 2. OCRopus
http://ocropus.googlecode.com/; 3. Cuneiform
http://launchpad.net/cuneiform-linux; 4. Unicode Text Segmentation
http://unicode.org/reports/tr29/

docs.sk

comprehensive documentation repository

See also