Statistical models, such as the Huffman
and Shannon-Fano models illustrated above, usually encode a single symbol at a
timeby generating a one-to-one symbol-to-code map. The basic idea behind a
dictionary-based compressor is to replace an occurrence of a particular phrase or group of
bytes in a piece of data with a reference to a previous occurrence of that phrase.
Like the Huffman Algorithm, dictionary based compression schemes also have a historical
basis. Where Morse code uses the frequency of occurrence of single characters, a widely
used form of Braille code, also developed in the mid-19th century, uses the frequency of
occurrence of words to provide compression. In Braille coding, 2 x 3 arrays of dots are
used to represent text. Different letters are represented by different combinations of
raised and flat dots. In Grade I Braille, each array of six dots represents a single
character. However, given six dots with two positions for each dot, we can obtain 26
or 64 different combinations. If we use 26 of these for the different letters, we have 38
combinations left over. In Grade 2 Braille, some of these leftover combinations are used
to represent words that occur frequently, such as "and" and "for". One
of the combinations is used as a special symbol indicating that the symbol following is a
word and not a character, thus allowing a large number of words to be represented by two
arrays of dots. These modifications, along with contractions of some of the words, result
in a significant compression in the average amount of space used.
In modern data compression, there are two main classes of dictionary-based schemes
schemes, named after Jakob Ziv and Abraham Lempel, who first proposed them in 1977 and
1978. These are called LZ77 and LZ78, respectively.