DJVUSED
Section: ~DjVuLibre-3.5~ (1)
Updated: 5/22/2005
Index
Return to Main Contents
NAME
djvused - Multi-purpose DjVu document editor.
SYNOPSIS
djvused [options] djvufile
DESCRIPTION
Program
djvused
is a powerful command line tool for manipulating multi-page documents,
creating or editing annotation chunks, creating or editing hidden text layers,
pre-computing thumbnail images, and more.
The program first reads the DjVu document
djvufile
and executes a number of djvused commands.
Djvused commands can be read from a specific file (when option
-f
is specified), read from the command line (when option
-e
is specified), or read from the standard input (the default).
OPTIONS
- -v
-
Cause
djvused
to print a command line prompt before reading commands
and a brief message describing how each command was executed.
This option is very useful for debugging djvused scripts
and also for interactively entering djvused commands on
the standard input.
- -f scriptfile
-
Cause
djvused
to read commands from file
scriptfile.
- -e command
-
Cause
djvused
to execute the commands specified by the option argument
commands.
It is advisable to surround the djvused commands by single
quotes in order to prevent unwanted shell expansion.
- -s
-
Cause
djvused
to save the file
djvufile
after executing the specified commands.
This is similar to executing command
save
immediately before terminating the program.
- -u
-
Cause
djvused
to print hidden text and annotations as UTF-8
instead of encoding non-ASCII characters with octal escape sequences
for maximal portability. This option is convenient for
manually editing or viewing the djvused output.
This option also causes the emission of an UTF-8 BOM under Windows.
- -n
-
Cause
djvused
to disregard save commands.
This is useful for debugging djvused scripts
without overwriting files on your disk.
DJVUSED EXAMPLES
There are many ways to use program
djvused.
The following examples illustrate
some common uses of this program.
Obtaining the size of a page
Command
size
outputs the width and height of the selected pages
using a
HTML
friendly syntax.
For instance, the following command prints the size
of page
3
of document
myfile.djvu.
-
djvused myfile.djvu -e 'select 3; size'
Extracting the hidden text
Command
print-pure-txt
outputs the text associated with a page or a document.
For instance, the following shell command outputs the text
for the entire document. Lines and pages are delimited
by the usual control characters.
-
djvused myfile.djvu -e 'print-pure-txt'
Command
print-txt
produces a more extensive output describing the structure
and the location of the text components. The syntax of this
output is described later in this man page. For instance,
the following shell command outputs extended text information
for page
3
of document
myfile.djvu.
-
djvused myfile.djvu -e 'select 3; print-txt'
Extracting the annotations
Annotation data can be extracted using command
print-ant.
The syntax of the annotation data is described later in this man page.
For instance, the following shell command outputs the annotation data
for the first page of document
myfile.djvu.
-
djvused myfile.djvu -e 'select 1; print-ant'
Command
print-ant
only prints the annotations stored in the selected component file.
Command
print-merged-ant
also retrieves annotations from all the component files
referenced by the current page (using
INCL
chunks) and prints the merged information.
Dumping/restoring annotations and text
Three commands,
output-txt,
output-ant, and
output-all,
produce djvused scripts. For instance, the following shell command
produces a djvused script,
myfile.dsed,
that recreates all the text and annotation data
in document
myfile.djvu.
-
djvused myfile.djvu -e 'output-all' > myfile.dsed
Script
myfile.dsed
is a text file that can be easily edited. The following shell command
then recreates the text and annotation information in file
myfile.djvu.
-
djvused myfile.djvu -f myfile.dsed -s
Extracting a page
Both commands
save-page
and
save-page-with
create a DjVu file representing the selected component file of a
document. The following shell command, for instance, creates a file
p05.djvu
containing page
5
of document
myfile.djvu.
-
djvused myfile.djvu -e 'select 5; save-page p05.djvu'
Each page of a document might import data from
another component file using the so-called inclusion (
INCL
) chunks. Command
save-page
then produces a file with unresolved references to imported data.
Such a file should then be made part of a multi-page document
containing the required data in other component files.
On the other hand, command
save-page-with
copies all the imported data into the output file.
This file is directly usable. Yet collecting several
such files into a multi-page document might lead
to useless data replication.
Pre-computing thumbnails
Commands
set-thumbnails
constructs thumbnails that can be later displayed by
DjVu viewers. The following shell command, for instance,
computes thumbnails of size
64x
64
pixels for all pages of file
myfile.djvu.
-
djvused myfile.djvu -e 'set-thumbnails 64' -s
DJVUSED COMMANDS
Command lines might contain zero, one, or more djvused commands and an
optional comment. Multiple djvused commands must be separated by a
semicolon character ';'. Comments are introduced by the '#' character
and extend until the end of the command line.
Selection commands
Multi-page DjVu documents are composed of a number of component files.
Most component files describe a specific page of a document. Some
component files contain information shared by several pages such as
shared image data, shared annotations or thumbnails. Many djvused
commands operate on selected component files. All component files are
initially selected. The following commands are useful for changing
the selection.
- n
-
Print the total number of pages in the document.
- ls
-
List all component files in the document. Each line contains an
optional page number, a letter describing the component file type, the
size of the component file, and identifier of the component file.
Component file type letters
P, I, A, and T
respectively stand for page data, shared image data, shared annotation
data, and thumbnail data. Page numbers are only listed for component
files containing page data.
When it is set, the optional page title (see command
set-page-title
below) is displayed after the component file identifier.
- select [fileid]
-
Select the component file identified by argument
fileid.
Argument
fileid
must be either a page number or a component file identifier.
The
select
command selects all component files
when the argument
fileid
is omitted.
- select-shared-ant
-
Select a component file containing shared annotations.
Only one such component file is supported by the current DjVu software.
This component file usually contains annotations pertaining to the
whole document as opposed to specific pages. An error message
is displayed if there is no such component file.
- create-shared-ant
-
Create and select a component file containing shared annotations.
This command only selects the shared annotation component file if such
a component file already exists. Otherwise it creates a new shared
annotation component file and makes sure that it is imported by all
pages in the document.
- showsel
-
Shows the currently selected component files
with the same format as command
ls.
Text and annotation commands
- print-pure-txt
-
Print the text stored in the hidden text layer of the selected pages.
A similar capability is offered by program
djvutxt.
Structural information is sometimes represented by control characters.
Text from different pages is delimited by form feed characters
("\f"). Lines are delimited by newline characters ("\n"). Columns,
regions, and paragraphs are sometimes delimited by vertical tab
("\013"), group separators ("\035") and unit separators ("\037")
respectively.
- print-txt
-
Prints extensive hidden text information for the selected pages.
This information describes the structure of the text on the
document page and locates the structural elements in the page image.
The syntax of this output is described later in this man page.
- remove-txt
-
Remove the hidden text information from the selected component files.
For instance, executing commands
select and remove-txt
removes all hidden text information from the DjVu document.
- set-txt [djvusedtxtfile]
-
Insert hidden text information into the selected pages.
The optional argument
djvusedtxtfile
names a file containing the hidden text information.
This file must contain data similar to what is produced
by command
print-txt.
When the optional argument is omitted, the
program reads the hidden text information from the djvused script
until reaching an end-of-file or a line containing a single period.
- output-txt
-
Prints a djvused script that reconstructs the hidden text
information for the selected pages. This script can later
be edited and executed by invoking program
djvused
with option
-f.
- print-ant
-
Prints the annotations of the selected component file.
The annotation data is represented using a simple syntax
described later in this document.
- print-merged-ant
-
Merge the annotations stored in the selected component files
with the annotations imported from other component files such
as the shared annotation component file..
The annotation data is represented using a simple syntax
described later in this document.
- remove-ant
-
Remove the annotation information from the selected component files.
For instance, executing commands
select and remove-ant
removes all annotation information from the DjVu document.
- set-ant [djvusedantfile]
-
Insert annotations into the selected component file.
The optional argument
djvusedantfile
names a file containing the annotation data.
This file must contain data similar to what is produced
by command
print-ant.
When the optional argument is omitted, the
program reads the annotation data from the djvused script itself
until reaching an end-of-file or a line containing a single period.
- output-ant
-
Print a djvused script that reconstructs the annotation
information for the selected pages. This script can later
be edited and executed by invoking program
djvused
with option
-f.
- print-meta
-
Print the metadata part of the annotations for the selected component file.
This command displays a subset of the information printed by command
print-ant
using a different syntax. metadata are organized as key-value pairs.
Each printed line contains the key name such as
author, title,etc.,
followed by a tab character ("\t") and
a double-quoted string representing the
UTF-8
encoded metadata value.
- remove-meta
-
Remove the metadata part of the annotations of
the selected component files.
- set-meta [djvusedmetafile]
-
Set the metadata part of the annotations of the selected component file.
The remaining part of the annotations is left unchanged.
The optional argument
djvusedmetafile
names a file containing the metadata.
This file must contain data similar to what is produced
by command
print-meta.
When the optional argument is omitted, the
program reads the annotation data from the djvused script itself
until reaching an end-of-file or a line containing a single period.
- print-xmp
-
Print the XMP metadata string contained in
the annotation chunk of the selected component file.
This command displays in fact a subset of the
information printed by command
print-ant.
- remove-xmp
-
Removes the XMP tag from the annotation chunk of the selected component file.
- set-xmp [xmpfile]
-
Set the XMP metadata part of the annotations of the selected component file.
The remaining part of the annotations is left unchanged.
The optional argument
xmpfile
names a file containing the XMP metadata in a format
similar to that produced
by command
print-xmp.
When the optional argument is omitted, the
program reads the XMP annotation data from the djvused script itself
until reaching an end-of-file or a line containing a single period.
- output-all
-
Print a djvused script that reconstructs both the hidden text and the
annotation information for the selected pages. This script can later
be edited and executed by invoking program
djvused
with option
-f.
Outline/bookmarks commands
- print-outline
-
Print the outline of the document.
Nothing is printed if the document contains no outline.
- remove-outline
-
Removes the outline from the document.
- set-outline [djvusedoutlinefile]
-
Insert outline information into the document.
The optional argument
djvusedoutlinefile
names a file containing the outline information.
This file must contain data similar to what is produced
by command
print-outline.
When the optional argument is omitted, the
program reads the hidden text information from the djvused script
until reaching an end-of-file or a line containing a single period.
Thumbnail commands
- set-thumbnails sz
-
Compute thumbnails of size
szxsz
pixels and insert them into the document.
DjVu viewers can later display these thumbnails very
efficiently without need to download the data for
each page. Typical thumbnail size range
from 48 to 128 pixels.
- remove-thumbnails
-
Remove the pre-computed thumbnails from the DjVu document.
New thumbnails can then be computed using command
set-thumbnails.
Save commands
The above commands only modify the memory image of the DjVu document.
The following commands provide means to save the modified data
into the file system.
- save
-
Save the modified DjVu document back into the input file
djvufile
specified by the arguments of the program
djvused.
Nothing is done if the DjVu file was not modified.
Passing option
-s
program
djvused
is equivalent to executing command
save
before exiting the program.
- save-bundled filename
-
Save the current DjVu document as a bundled
multi-page DjVu document named
filename.
A similar capability is offered by program
djvmcvt.
- save-indirect filename
-
Save the current DjVu document as an indirect
multi-page DjVu document. The index file of the
indirect document will be named
filename.
All other files composing the indirect document
will be saved into the same directory as the
index file.
A similar capability is offered by program
djvmcvt.
- save-page filename
-
Save the selected component file into DjVu file
filename.
The selected component file might import data from
another component file using the so-called inclusion (
INCL
) chunks. This command then produces a file with unresolved
references to imported data. Such a file should then be made part of
a multi-page document containing the required data in other component
files.
- save-page-with filename
-
Save the selected component file into DjVu file
filename.
All data imported from other component files is copied into the output
file as well. This command always produces a usable DjVu file. On
the other hand, collecting several such files into a multi-page
document might lead to useless data replication.
Miscellaneous commands
- help
-
Display a help message listing all commands supported by
djvused.
- dump
-
Display the
EA IFF 85
structure of the document or of the selected component file.
A similar capability is offered by program
djvudump.
- size
-
Display the width and the height of the selected pages. The
dimensions of each page are displayed using a syntax suitable for
direct insertion into the
<EMBED...></EMBED>
tags. This command also displays the default page orientation
when it is different from zero.
- set-rotation [+-]rot
-
Changes the default orientation of the selected pages.
The orientation is expressed as an integer in range 0..3
representing a number of 90 degree counter-clockwise rotations.
When the argument is preceded by a sign
+ or -,
argument
rot
counts how many additional 90 degree counter-clockwise rotations
should be applied to the page. Otherwise, argument
rot
represents the desired absolute page orientation.
Only DjVu pages can be rotated.
Pages represented as a raw IW44 image cannot be rotated.
- set-dpi dpi
-
Sets the resolution of the page image in dots per inche.
Argument
dpi
should be in range 25..6000.
- set-page-title title
-
Sets a page title for the selected page.
When page titles are available, recent versions
of the DjVuLibre viewers display these page
titles instead of page numbers and also
accept them in page selection options.
Command
ls
can be used to see both the page titles and page identifiers.
To unset a page title, simply make it equal to the page identifier.
DJVUSED FILE FORMATS
Djvused uses a simple parenthesized syntax to represent
both annotations and hidden text.
- *
-
This syntax is the native syntax used by DjVu for storing annotations.
Program
djvused
simply compresses the annotation data using the
bzz(1)
algorithm.
- *
-
This syntax differs from the native syntax used by DjVu for
storing the hidden text. Program
djvused
performs the translations between the compact binary representation
used by DjVu and the easily modifiable parenthesized syntax.
General syntax
Djvused files are
ASCII
text files. The legal characters in djvused files are
the printable
ASCII
characters and the space, tab, cr, and nl characters.
Using other characters has undefined results.
Djvused files are composed of a sequence of expressions separated by
blank characters (space, tab, cr, or nl). There are four kind of
expressions, namely integers, symbols, strings and lists.
- Integers:
-
Integer numbers are represented by one or more digits,
with the usual interpretation.
- Symbols:
-
Symbols, or identifiers, are sequences of printable ascii characters
representing a name or a keyword. Acceptable characters are the alpha-numeric
characters, the underscore "_", the minus character "-", and the hash
character "#". Names should not begin with a digit or a minus character.
- Strings:
-
Strings denote an arbitrary sequence of bytes,
usually interpreted as a sequence of
UTF-8
encoded characters. Strings in djvused files are similar to strings in the C
language. They are surrounded by double quote characters. Certain sequences
of characters starting with a backslash ("\") have a special meaning.
A backslash followed by letter
"a", "b", "t", "n", "v", "f", "r", "\", and stands for the ascii character BEL(007), BS(008), HT(009),
LF(010), VT(011), FF(012), CR(013), BACKSLASH(134)
and DOUBLEQUOTE(042) respectively.
A backslash followed by one to three digits stands for the byte
whose octal code is expressed by the digits.
All other backslash sequences are illegal.
All non printable ascii characters must be escaped.
- Lists:
-
Lists are sequence of expressions separated by blanks and surrounded by
parentheses. All expressions types are acceptable within a list, including
sub-lists.
Hidden text syntax
The building blocks of the hidden text syntax are lists
representing each structural component of the hidden text.
Structural components have the following form:
-
(type xmin ymin xmax ymax ... )
The symbol
type
must be one of
page, column, region, para, line, word, or char,
listed here by decreasing order of importance.
The integers
xmin, ymin, xmax, and ymax
represent the coordinates of a rectangle indicating the position of the
structural component in the page. Coordinates are measured in pixels and have
their origin at the bottom left corner of the page. The remaining expressions
in the list either is a single string representing the encoded text associated
with this structural component, or is a sequence of structural components with
a lesser type.
The hidden text for each page is simply represented by
a single structural element of type
page.
Various level of structural information are acceptable. For instance, the
page level component might only specify a page level string, or might only
provide a list of lines, or might provide a full hierarchy down to the
individual characters.
Outline/Bookmark syntax
The outline syntax is a single list of the form
-
(bookmarks ...)
The first element of the list is symbol
bookmarks.
The subsequent elements are lists representing
the toplevel outline entries.
Each outline entry is represented by
a list with the following form:
-
(title url ... )
The string
title
is the title of the outline entry.
The destination string
url
can be either an arbitrary percent encoded
URL,
or composed of the hash character ("#")
followed by a page name or number,
or composed of the question mark character ("?")
followed by cgi-style arguments interpreted by the djvu viewer.
The remaining expressions in the list
describe subentries of this outline entry.
Annotation syntax
Annotations are represented by a sequence of annotation expressions.
The following annotation expressions are recognized:
- (background color)
-
Specify the color of the viewer area surrounding the DjVu image.
Colors are represented with the X11 hexadecimal syntax
#RRGGBB.
For instance,
#000000
is black and
#FFFFFF
is white.
- (zoom zoomvalue)
-
Specify the initial zoom factor of the image.
Argument
zoomvalue
can be one of
stretch, one2one, width, page,
or composed of the letter
d
followed by a number in range 1 to 999 representing a zoom factor
(such as in
d300 or d150
for instance.)
- (mode modevalue)
-
Specify the initial display mode of the image.
Argument
modevalue
is one of
color, bw, fore, or back.
- (align horzalign vertalign)
-
Specify how the image should be aligned on the viewer surface.
By default the image is located in the center.
Argument
horzalign
can be one of
left, center, or right.
Argument
vertalign
can be one of
top, center, or bottom.
- (maparea url comment area ...)
-
Define an hyper-link for the specified destination.
-
Argument
url
can have one of the following forms:
-
href
(url href target)
where
href
is a string representing the destination and
target
is a string representing the target frame for the hyper-link,
as defined by the
HTML
anchor tag
<A>.
The destination string
href
can be either an arbitrary percent encoded
URL,
or composed of the hash character ("#")
followed by a page name or number,
or composed of the question mark character ("?")
followed by cgi-style arguments interpreted by the djvu viewer.
Page numbers may be prefixed with an optional sign
to represent a page displacement.
For instance the strings
#-1
and
#+1
can be used to access the previous page and
the next page.
Argument
comment
is a string that might be displayed by the viewer
when the user moves the mouse over the hyper-link.
Argument
area
defines the shape and the location of the hyperlink.
The following forms are recognized:
-
(rect xmin ymin width height)
(oval xmin ymin width height)
(poly x0 y0 x1 y1 ... )
(text xmin ymin width height)
(line x0 y0 x1 y1)
All parameters are numbers representing coordinates.
Coordinates are measured in pixels and have their origin
at the bottom left corner of the page.
The remaining expressions in the
maparea
list represent the visual effect associated with
the hyper-link.
A first set of options defines how borders
are drawn for
rect, oval, polygon, or text
hyperlink areas.
-
(none)
(xor)
(border color)
(shadow_in [thickness])
(shadow_out [thickness])
(shadow_ein [thickness])
(shadow_eout [thickness])
where parameter
color
has syntax
#RRGGBB
as described above,
and parameter thickness is an integer in range 1 to 32.
The last four border options are only supported for
rect
hyperlink areas.
Although the border mode defaults to
(xor),
it is wise to always specify the border mode.
Border options do not apply to
line
areas.
When a border option is specified, the border
becomes visible when the user moves the mouse
over the hyperlink. The border may be made
always visible by using the following option:
-
(border_avis)
The following two options may be used with
rect
hyperlink areas.
The complete area will be highlighted using the
specified color at the specified opacity (0-100, default 50).
-
(hilite color)
(opacity op)
This is often used with an empty
URL
for simply emphasizing a specific
segment of an image.
The following three options may be used with line areas to
specify an optional ending arrow, the line width and color.
The default is a black line with width 1 and without arrow.
-
(arrow)
(width w)
(lineclr color)
Finally the following three options can be used with text areas.
The default background color is transparent.
The default text color is black.
The
pushpin
option indicates that the text is symbolized by a small pushpin icon.
Clicking the icon reveals the text.
-
(backclr bkcolor)
(textclr txtcolor)
(pushpin)
- (metadata ... (key value) ... )
-
Define metadata entries.
Each entry is identified by a symbol
key
representing the nature of the meta data entry.
The string
value
represents
the value associated with the corresponding key.
Two sets of keys are noteworthy:
keys borrowed from the BibTex bibliography system,
and keys borrowed from the PDF DocInfo metadata.
BibTex keys are always expressed in lowercase, such as
year, booktitle, editor, author, etc..
DocInfo keys start with an uppercase letter, such as
Title, Author, Subject, Creator,
Produced, Trapped,
CreationDate, and ModDate.
The values associated with the last two keys
should be dates expressed according to RFC 3339.
LIMITATIONS
The current version of program
djvused
only supports selecting one component file
or all component files. There is no way to
select only a few component files.
CREDITS
This program was initially written by Léon Bottou
<
leonb@users.sourceforge.net> and was improved by Yann Le Cun
<
profshadoko@users.sourceforge.net>, Florin Nicsa, Bill Riemers
<
docbill@sourceforge.net> and many others.
SEE ALSO
djvu(1),
djvutxt(1),
djvmcvt(1),
djvudump(1),
bzz(1),
Emacs djvused front end
djvu.el on
GNU Elpa
repository.
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- DJVUSED EXAMPLES
-
- Obtaining the size of a page
-
- Extracting the hidden text
-
- Extracting the annotations
-
- Dumping/restoring annotations and text
-
- Extracting a page
-
- Pre-computing thumbnails
-
- DJVUSED COMMANDS
-
- Selection commands
-
- Text and annotation commands
-
- Outline/bookmarks commands
-
- Thumbnail commands
-
- Save commands
-
- Miscellaneous commands
-
- DJVUSED FILE FORMATS
-
- General syntax
-
- Hidden text syntax
-
- Outline/Bookmark syntax
-
- Annotation syntax
-
- LIMITATIONS
-
- CREDITS
-
- SEE ALSO
-
This document was created by
man2html,
using the manual pages.
Time: 00:05:51 GMT, November 14, 2015