PDF2DJVU(1)
NAME
pdf2djvu - creates DjVu files from PDF files
SYNOPSIS
pdf2djvu [{-o | --output} output-djvu-file] [option...] pdf-file
pdf2djvu {-i | --indirect} index-djvu-file [option...] pdf-file
pdf2djvu {--version | --help | -h}
DESCRIPTION
This program creates a DjVu file from the Portable Document Format file
pdf-file.
OPTIONS
- pdf2djvu accepts the following options:
- Document type, file names
- -o, --output=output-djvu-file
Generate a bundled multi-page document. Write the file into
output-djvu-file instead of standard output. - -i, --indirect=index-djvu-file
Generate an indirect multi-page document. Use index-djvu-file as the index file name; put the component files into the same
directory. The directory must exist and be writable. - --pageid-template=template
Specifies the naming scheme for page identifiers. Consult the
"TEMPLATE LANGUAGE" section for the template language description.The default template is "p{page:04*}.djvu".For portability reasons, page identifiers:o must consist only of lowercase ASCII letters, digits, _, +,and dot,o cannot start with a dot,o cannot contain two consecutive dots,o must end with the .djvu or the .djv extension. - --pageid-prefix=prefix
Equivalent to "--pageid-template=prefix{page:04*}.djvu".
- --page-title-template=template
Specifies the template for page titles. Consult the "TEMPLATE
LANGUAGE" section for the template language description.The default is to set no page titles. - Resolution, page size
- -d, --dpi=resolution
Specifies the desired resolution to resolution dots per inch. The default is 300 dpi. The allowed range is: 72 <= resolution <= 6000.
- --media-box
Use MediaBox to determine page size. CropBox is used by default.
- --page-size=widthxheight
Specifies the preferred page size to width pixels x height pixels. The actual page size may be altered in order to respect aspect
ratio and DjVu limitations on resolution. (This option takes
precedence over -d/--dpi.) - --guess-dpi
Try to guess native resolution by inspecting embedded images. Use
with care. - Image quality
- --bg-slices=n+...+n, --bg-slices=n,...,n
Specifies the encoding quality of the IW44 background layer. This
option is similar to the -slice option of c44. Consult the c44(1) manual page for details. The default is 72+11+10+10. - --bg-subsample=n
Specifies the background subsampling ratio. The default is 3. Valid values are integers between 1 and 12, inclusive.
- --fg-colors=default
Try to preserve all the foreground layer colors. This is the
default. - --fg-colors=web
Reduce foreground layer colors to the web palette (216 colors).
This option is not recommended. - --fg-colors=n
Use GraphicsMagick to reduce number of distinct colors in the
foreground layer to n. Valid values are integers between 1 and
4080. This option is not recommended. - --fg-colors=black
Discard any color information from the foreground layer. - --monochrome
Render pages as monochrome bitmaps. With this option, --bg-... and --fg-... options are not respected. - --loss-level=n
Specifies the aggressiveness of the lossy compression. The default is 0 (lossless). Valid values are integers between 0 and 200,
inclusive. This option is similar to the -losslevel option of cjb2;
consult the cjb2(1) manual page for details. This option is respected only along with the --monochrome option. - --lossy
Synonym for --loss-level=100. - --anti-alias
Enable font and vector anti-aliasing. This option is not
recommended. - Extraction
--no-metadata
Don't extract the metadata. - By default:
- o The following entries of the document information dictionary
are extracted: Title, Author, Subject, Creator, Producer,
CreationDate, ModDate. Timestamps are formatted according to
RFC 3999[1], with date and time components separated by a single space. - The XMP metadata is extracted (or created) and updated
accordingly. - --verbatim-metadata
Keep the original metadata intact. - --no-outline
Don't extract the document outline. - --hyperlinks=border-avis Make hyperlink borders always visible.
- By default, a hyperlink border is visible only when the mouse is
over the hyperlink. - --hyperlinks=#RRGGBB
Force the specified border color for hyperlinks. - --no-hyperlinks, --hyperlinks=none Don't extract hyperlinks.
- --no-text
Don't extract the text. - --words
Extract the text. Record the location of every word. This is the
default. - --lines
Extract the text. Record the location of every line, rather that
every word. - --crop-text
Extract no text outside the page boundary. - --no-nfkc
Don't NFKC[2]-normalize the text. - --filter-text=command-line Filter the text through the command-line. The provided filter must preserve whitespace, control characters and decimal digits.
- This option implies --no-nfkc.
- -p, --pages=page-range
Specifies pages to convert. page-range is a comma-separated list of sub-ranges. Each sub-range is either a single page (e.g. 17) or a contiguous range of pages (e.g. 37-42). Pages are numbered from
1. - The default is to convert all pages.
- Performance
-j, --jobs=n
Use n threads to perform conversion. The default is to use one
thread. - -j0, --jobs=0
Determine automatically how many threads to use to perform
conversion. - Verbosity, help
-v, --verbose
Display more informational messages while converting the file. - -q, --quiet
Don't display informational messages while converting the file. - --version
Output version information and exit. - -h, --help
Display help and exit.
ENVIRONMENT
- OMP_*
- Details of runtime behaviour with respect to parallelism can be
controlled by several environment variables. Please refer to the
OpenMP API specification[3] for details.
TEMPLATE LANGUAGE
- Template syntax
- The template language is roughly modelled on the Python string formatting syntax[4].
- A template is a piece of text which contains fields, surrounded by
curly braces {}. Fields are replaced with appropriately formatted
values when the template is evaluated. Moreover, {{ is replaced with a single { and }} is replaced with a single }. - Field syntax
Each field consists of a variable name, optionally followed by a shift, optionally followed by a format specification. - The shift is a signed (i.e. starting with a + or - character) integer.
- The format specification consists of a colon, followed by a width
specification. - The width specification is a decimal integer defining the minimum field
width. If not specified, then the field width will be determined by the
content. Preceding the width specification with a zero (0) character
enables zero-padding. - The width specification is optionally followed by an asterisk (*)
character, which increases the minimum field width to the width of the longest possible content of the variable. - Available variables
page, spage
Page number in the PDF document. - dpage
Page number in the DjVu document.
IMPLEMENTATION DETAILS
- Layer separation algorithm
- Unless the --monochrome option is on, pdf2djvu uses the following nave
layer separation algorithm:
1. For each page, do the following:
1. Raster the page into a pixmap, in the usual manner.2. Raster the page into another pixmap, omitting the following
page elements:o text,o 1 bit-per-pixel raster images,o vector elements (except fills of large areas).3. Compare both pixmaps, pixel by pixel:
1. If their colors match, classify the pixel as a part of the background layer.2. Otherwise, classify the pixel as a part of the foreground
layer.
BUG REPORTS
If you find a bug in pdf2djvu, please report it at the issue
tracker[5].
SEE ALSO
djvu(1), djvudigital(1), csepdjvu(1)
AUTHOR
- Jakub Wilk <jwilk@jwilk.net>
- Author.
COPYRIGHT
Copyright (C) 2007, 2008, 2009, 2010 Jakub Wilk
NOTES
- 1. RFC 3999
- http://www.ietf.org/rfc/rfc3339
- 2. NFKC
http://unicode.org/reports/tr15/ - 3. OpenMP API specification
http://openmp.org/wp/openmp-specifications/ - 4. Python string formatting syntax
http://docs.python.org/library/string.html#format-string-syntax - 5. the issue tracker
http://code.google.com/p/pdf2djvu/issues/