djvu2hocr(1)
NAME
djvu2hocr - DjVu to hOCR converter
SYNOPSIS
djvu2hocr [option...] djvu-file djvu2hocr {--version | --help | -h}
DESCRIPTION
djvu2hocr converts hidden text from a DjVu file to the hOCR[1] format.
OPTIONS
- Text segmentation options
- --word-segmentation=simple
Use the same word segmentation as found in the DjVu file.This is the default.
- --word-segmentation=uax29
Use the Unicode Text Segmentation[2] algorithm to break lines into words, possibly fixing word segmentation found in the DjVu file.
- Other options
- --version
Output version information and exit.
- -h, --help
Display help and exit.
SEE ALSO
AUTHOR
- Jakub Wilk <ubanus@users.sf.net>
- Author.
COPYRIGHT
Copyright (C) 2009, 2010 Jakub Wilk
NOTES
- 1. hOCR
- http://docs.google.com/View?docid=dfxcv4vc_67g844kf
- 2. Unicode Text Segmentation
http://unicode.org/reports/tr29/