DjVuLibre
Features
Screenshots
Downloads
Documentation
License
Project Page
Browse Git
Credits
Links
DjView4
GSDjVu
JavaDjVu
DjVuZone
Any2DjVu conversion server
LizardTech home
SourceForge Logo

DjVu: a Short Technical Introduction


Project Page |  Forums |  Bugs |  Support |  Patches |  News |  Git |  Files 

DjVu: a Compression Technique and Software Platform for Publishing Scanned Documents, Digital Documents and High-Resolution Images on the Web.

by Yann LeCun, Léon Bottou, and Patrick Haffner

Despite the growing importance of the Internet, much of the knowledge, culture, and educational material in existence today is still available only in paper form. Bringing this wealth of information into the digital realm in a form that is faithful to the original, easily accessible, and searchable, is an essential step towards making the Internet the World's Universal Library.

DjVu (pronounced "deja vu") is a compression technique, a file format, and a delivery platform that is specifically designed to enable the creation of digital libraries of printed material, either scanned from paper or digitally produced. For scanned document, DjVu file sizes are typically 3 to 10 times smaller than TIFF or PDF in black and white, and 5 to 10 times smaller than JPEG in color.

A typical page from a book, magazine, or ancient document scanned in color at 300dpi contains on the order of 8 million pixels, and occupies 24MB uncompressed. Traditional compression techniques such as JPEG are notoriously inefficient on several counts:

  • typical JPEG file sizes for a page are between 400KB and 2MB at best, which is totally impractical for remote access.
  • sharp edges (such as character outlines) are the cause of numerous wasted bits and/or unpleasant ringing artifacts.
  • such large images are very slow to render, require a very large memory buffer for the decompressed image in the client, and are not easily zoomable or panable with current web browsers.
  • the text is not normally separated from the image, and therefore cannot be OCRed, indexed, or searched.
  • no provision is made for multipage documents, unless one encapsulates the images into a container format such as PDF, thereby adding additional layers of inefficiency.

The DjVu system aleviates these problems and can handle bitonal documents, low-color (palettized) images, continuous-tone images (photos, etc...), scanned documents in color or greyscale, as well as digitally produced documents (e.g. in PostScript or PDF).

Bitonal documents are encoded with a technique dubbed JB2, which builds a compressed library of repeating shapes in the document (such as characters), and codes the locations where they appear on each page. Low-color images are compressed much the same way, with the addition of a color palette, and a color index for each shape. Continuous-tone images are compressed with a progressive wavelet-based method dubbed IW44 that is on par with JPEG2000 in terms of signal to noise ratio, but whose decoder/renderer is very memory efficient and optimized for speed (3 times faster than the fastest JPEG-2000 mode). The coders back-ends make extensive use of a new binary adaptive arithmetic coder called the Z-coder.

Scanned color documents are decomposed into a foreground plane and a background plane. The foreground plane contains the text and the line drawings compressed as a bitonal or low-color image at maximum resolution (using JB2), thereby preserving the sharpness and readability of the text. The background plane contains the pictures and paper textures compressed at reduced resolution with IW44. Areas of the background covered by foreground components are smoothly interpolated so as to minimize their coding cost. The foreground/background segmenter first detects objects that are sharply contrasted with their surroundings, and then classifies them into the foreground or the background planes using several criteria, such as their color uniformity, their geometry, and an estimate of their coding cost.

Digitally produced PDF or PostScript documents are turned into a list of low-level drawing commands using the popular tool GhostScript. This list is then translated into a list of non-overlapping shapes which are subsequently classified into the foreground or the background layer using a number of shape-based heuristics. The layers are then compressed as with scanned documents.

Bitonal documents in DjVu typically occupy 5 to 30KB per page at 300dpi, which is 3 to 8 times smaller than Group 4 (used in Fax machines, in TIFF files, and in PDF files). Low-color images such as icons are typically 2 times smaller than with GIF, but can be up to 10 times smaller if they contain lots of text. Photos are about 2 times smaller than JPEG, and about the same size as fast modes of JPEG-2000 for the same SNR. An interesting aspect of IW44 wavelet codec is that it is optimized to allow on-the-fly decompression/rendering of the area visible in the display window (and not more) as the user zooms and pans around. This allows to keep the images in compressed form in the RAM of the client machine, and allows to display very large images without excessive memory requirements. Scanned color and grayscale documents in DjVu are typically 30 to 100KB per page at 300dpi, which is 5 to 10 times smaller than JPEG, and about 2-3 times smaller than MRC/T.44 or TIFF/FX. Digitally produced documents with mostly text are typically 1 to 3 times smaller than PDF or gzipped PostScript originals at 300dpi, but can be considerably smaller if the documents contain many pictures.

DjVu documents are displayed within web browsers through a very light-weight plug-in (available for all major platforms). Everything in the design of DjVu was optimized to reduce the delay between the user's decision to view a page, and the display of that page on the screen. A multithreaded software architecture with smart caching allows individual document components to be loaded and pre-decoded on-demand. Pages are loaded on demand, allowing random access without prior download of the entire document, and without the help of a byte server. Page components (foreground layer, background chunks,...) are downloaded in sequence and rendered by a separate thread as soon as they are complete. This allows progressive rendering and refinement of the images. The page that follows the page currently being displayed is pre-loaded, pre-decoded and cached automatically thereby reducing the page-flipping delay. The DjVu viewer has a "modeless" graphical user interface that allows fast zooming, panning, and page flipping with a single mouse operation or keystroke.

The foreground layer can be OCRed and the result embedded back into the DjVu file as a searchable "hidden text" layer. Tools are available to extract that text and translate it into a variety of formats that include each word annotated with the coordinates of its bounding box on the page. The formats also include the document structure (pages, columns, paragraphs, lines, words). Hyperlinks, annotations, page thumbnails, and other metadata can also be embedded into DjVu documents.

Server-side full-text search can easily be provided using free indexing tools and a few Perl scripts. Large collections have been (and are being) put on the Web in DjVu with full-text search capabilities, including the NIPS Proceedings (http://nips.djvuzone.org, 13 volumes, 14,000 pages at 400dpi, 191MB), the Century Dictionnary (http://www.century-dictionary.com, 12 volumes, over 10,000 pages, 500,000 definitions, 22 million searchable words, 850MB), along with several national library collections and content from commercial providers around the world. DjVu is currently used by thousands of users to publish and exchange scanned documents on the Web. A list of selected web sites that use DjVu is available here.

DjVu can be seen as a general open platform for document delivery. The DjVu Reference Library, which includes the full multithreaded decoder/renderer, the IW44 encoder, the palettized image encoder, and basic bitonal and color document encoders is available as Free Software under the GNU GPL and can be used as a platform for research on new codecs, segmentation schemes, delivery mechanisms, viewing interfaces, and content analysis systems.

Papers, examples, benchmarks, and pointers are available at http://www.djvuzone.org.

Source code is available at http://djvu.sourceforge.net

Plug-ins, compressors, SDKs, and commercial software can be obtained at http://www.djvu.com.

Servers that can convert documents in any format to DjVu are available at http://openlib.djvuzone.org, http://bib2web.djvuzone.org, and http://any2djvu.djvuzone.org.

DjVu Libre