Huffman Coding Tutorial and Survey
Huffman Coding Tutorial and Survey
Huffman Coding
1 INTRODUCTION
No introductory computer science algorithms course would be complete without consideration
of certain pervasive problems, and discussion and analysis of the algorithms that solve them.
That short-list of important techniques includes heapsort and quicksort, dictionary structures
using balanced search trees, Knuth-Morris-Pratt pattern search, Dijkstra’s algorithm for single-
source shortest paths, and, of course, David Huffman’s iconic 1951 algorithm for determining
minimum-cost prefix-free codes [33] – the technique known as Huffman coding.
The problem tackled by Huffman was – and still is – an important one. Data representation
techniques are at the heart of much of computer science, and Huffman’s work marked a critical
milestone in the development of efficient ways for representing information. Since its publication
in 1952, Huffman’s seminal paper has received more the 7,500 citations1 , and has influenced many
of the compression and coding regimes that are in widespread use today in devices such as digital
cameras, music players, software distribution tools, and document archiving systems.
Figure 1 shows a screen-shot illustrating a small subset of the many hundreds of images related
to Huffman coding that can be found on the world-wide web, and demonstrates the ubiquity
of Huffman’s tree-based approach. While an important underlying motivation for Huffman’s
algorithm, the prevalence of trees as a way of explaining the encoding and decoding processes
is, for the most part, a distraction; and much of the focus in this article is on implementations
that avoid explicit trees. It is also important to distinguish between a set of codewords and a set
of codeword lengths. Defining the code as a set of codeword lengths allows a choice to be made
1 Google Scholar, accessed 2 August 2018.
Author’s address: Alistair Moffat, The University of Melbourne, Victoria 3010, Australia, ammoffat@[Link].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the
full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@[Link].
© 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM.
0360-0300/2019/6-ART1 $15.00
[Link]
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
Fig. 1. A small section of the search result page returned for the query “Huffman code” at Google Image
Search, captured as a screen-shot on 5 August 2018. The full result page contains approximately one thousand
such images.
between many different sets of complying codewords, a flexibility that is – as is explained shortly –
very important as far as implementation efficiency is concerned.
The first three sections of this article provide a tutorial describing Huffman coding and its
implementation. Section 4 then surveys some of the many refinements and variants that have been
proposed through the six decades since Huffman’s discovery. Next, Section 5 summarizes two
complementary techniques for entropy-coding, the general task of representing messages using as
few symbols as possible. Finally, Section 6 evaluates the relative strengths and drawbacks of those
two newer techniques against the key properties of Huffman’s famous method.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
an inequality that was first noted by Kraft [40] and elaborated on by McMillan [48]. If a code
T = ⟨ℓi ⟩ is feasible, then it is possible to assign to each symbol 0 ≤ i < n a codeword of length ℓi in
a manner such that no codeword is a prefix of any other codeword, the latter being the standard
definition of “prefix free code”. A set of codewords complies with code T if the i th codeword is
of length ℓi , and none of the codewords is a prefix of any other codeword. For example, suppose
that n = 4, that T = ⟨2, 3, 2, 3⟩ and hence that K(T ) = 3/4. Then T is feasible, and (amongst many
other options) the codewords 01, 000, 10, and 001 comply with ⟨ℓi ⟩. On the other hand, the set of
codewords 00, 111, 10, and 001 have the right lengths, but nevertheless are not compliant with
T = ⟨2, 3, 2, 3⟩, because 00 is a prefix of 001. Based on these definitions, it is useful to regard a prefix
free code as having been determined once a feasible code has been identified, without requiring
that a complying codeword assignment also be specified.
The cost of a feasible code, denoted C(·, ·), factors in the weight associated with each symbol,
n−1
Õ
C(W ,T ) = C(⟨w i ⟩, ⟨ℓi ⟩) = w i · ℓi . (2)
i=0
If the w i ’s are integers, and w i reflects the frequency of symbol i in a message of total length
m = i w i , then C(·, ·) is the total number of channel symbols required to transmit the message
Í
using the code. Alternatively, if the w i ’s are symbol occurrence probabilities that sum to one, then
C(·, ·) is the expected per-input-symbol cost of employing the code to represent messages consisting
of independent drawings from 0 . . . n − 1 according to the probabilities W = ⟨w i ⟩.
Let T = ⟨ℓi | 0 ≤ i < n⟩ be a feasible n-symbol code. Then T is a minimum-redundancy code for
W if, for every other n-symbol code T ′ that is feasible, C(W ,T ) ≤ C(W ,T ′). Note that for any
sequence W there may be multiple different minimum-redundancy codes with the same least cost.
Continuing the previous example, T = ⟨2, 3, 3, 2⟩ cannot be a minimum-redundancy code for any
sequence of weights ⟨w i ⟩, since the feasible code T ′ = ⟨2, 2, 2, 2⟩ will always have a strictly smaller
cost. More generally, when considering binary channel alphabets, a Kraft sum K(·) that is strictly
less than one always indicates a code that cannot be minimum-redundancy for any set of weights
W , since at least one codeword can be shortened, thereby reducing C(·, ·). Conversely, a Kraft sum
that is greater than one indicates a code that is not feasible – there is no possible set of complying
codewords. The code T = ⟨1, 2, 2, 2⟩ is not feasible, because K(T ) = 5/4, and hence T cannot be a
minimum-redundancy code for any input distribution W = ⟨w 0, w 1, w 2, w 3 ⟩.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
2 HUFFMAN’S ALGORITHM
This section describes the principle underlying Huffman code construction; describes in increasing
levels of detail how Huffman coding is implemented; demonstrates that the codes so generated are
indeed minimum-redundancy; and, to conclude, considers non-binary output alphabets.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
and the code T ′ = ⟨1, 2, 3, 4, 5, 5⟩ would have emerged. This code has different maximum codeword
length to the first one, but the same cost, since 10 × 1 + 6 × 2 + 2 × 3 + 1 × 4 + (1 + 1) × 5 is also
equal to 42. No symbol-by-symbol code can represent W = ⟨10, 6, 2, 1, 1, 1⟩ in fewer than 42 bits.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
0
21 1 0
21 1
a, 10 11 a, 10 11
0 1 0 1
5 b, 6 b, 6 5
0 1 0 1
2 3 3 2
0 1 0 1 0 1 0 1
f, 1 e, 1 d, 1 c, 2 c, 2 d, 1 e, 1 f, 1
Fig. 2. Two alternative code trees for W = ⟨10, 6, 2, 1, 1, 1⟩ and code T = ⟨1, 2, 4, 4, 4, 4⟩. Leaves are labeled
with letters corresponding to their symbol numbers, and with their weights; internal nodes with their weight
only. The tree on the left is generated by a strict application of Algorithm 1; the one on the right exchanges
left and right children at some of the internal nodes, so that the leaves are sorted. The two trees represent
different sets of complying codewords.
– are in lexicographic order. This important “ordered leaves” property will be exploited in the
implementations described in Section 3.
Other trees emerge if ties are broken in different ways, but all resulting codes have the same cost.
Any particular tie-breaking rule simply selects one amongst those equal-cost choices. The rule
given above – preferring symbols with high indexes to symbols with lower indexes, and preferring
leaves to internal nodes – has the useful side effect of generating a code with the smallest maximum
codeword length, L = max0≤i <n ℓi .
The edge rearrangements employed in Figure 2 mean that multiple complying codeword sets
can be generated for every sequence of weights. Indeed, since there are n − 1 internal nodes in
every binary tree with n leaves, and each internal node has two alternative orientations, at least
2n−1 different arrangements of a Huffman tree can be achieved via edge swaps alone. Even more
codes are possible if non-sibling leaves at the same depth are swapped with each other, which can
be done once the code has been derived, even if those nodes have different weights.
Algorithm 1 makes use of a priority queue data structure, denoted Q in the pseudo-code, and
standard priority queue operations to insert a new object, and to identify and delete the item of
smallest weight. A total of 2n − 1 queue Insert operations, and 2n − 1 queue ExtractMin operations
are required in order to construct the final tree of n leaves and n − 1 internal nodes. This formulation
suggests that the binary heap is a suitable queue structure, supporting Insert and ExtractMin
operations in O(log n) time each, and hence allowing Huffman codes (via a traversal of the Huffman
tree) to be computed in O(n log n) time.
At this point most algorithms textbooks move on to their next topic, leaving the impression that
Huffman codes are constructed using an O(n log n)-time heap-based algorithm; that encoding is
carried out by tracing the path in a tree from the root through to a specified leaf; and that decoding
is also performed by following edges in the Huffman tree, this time as bits are fetched one-by-one
from the compressed data stream. The same is true for the Wikipedia article on Huffman coding.2
The remainder of this section addresses the first of these misconceptions; and then Section 3
examines the mechanics of encoding and decoding.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
Figure 3 gives an example of code construction, and shows several snapshots of the array W as
Algorithm 2 is applied to the 10-sequence W = ⟨20, 17, 6, 3, 2, 2, 2, 1, 1, 1⟩. By the end of phase 1, for
example, W [6] represents an internal node whose parent is represented in W [4]. The internal node
at W [6] has one internal node child represented at W [9], and hence also has one leaf node as a
child, which is implicit and not actually recorded anywhere. In total, W [6] is the root node of a
subtree that spans three of the original symbols.
The second phase, at steps 14 to 16, traverses that tree from the root down, converting the array
of parent pointers into an array of internal node depths. Element W [0] is again unused; by the end
of this phase, the other n − 1 elements reflect the depths of the n − 1 internal nodes, with the root,
represented by W [1], having a depth of zero. Node depths are propagated downward through the
tree via the somewhat impenetrable statement at step 16: “W [next] ← W [W [next]] + 1”.
Steps 18 to 27 then process the array a third time, converting internal node depths into leaf
depths. Quantity avail records the number of unused slots at level depth of the tree, starting with
values of one and zero respectively. As each level of the tree is processed, some of the avail slots
are required to accommodate the required number of internal nodes; the rest must be leaves, and
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
0 2 4 6 8 10
Initial arrangement, W [i] = w i 20 17 6 3 2 2 2 1 1 1
finished, root = 1 55 1 2 3 3 4 5 5 6
0 2 4 6 8 10
Phase 2, next = 2 0 1 2 3 3 4 5 5 6
next = 8 0 1 2 3 3 4 4 5 6
finished 0 1 2 3 3 4 4 4 5
0 2 4 6 8 10
Phase 3, next = 1, avail = 2 1 0 1 2 3 3 4 4 4 5
next = 3, avail = 6 1 2 4 2 3 3 4 4 4 5
finished, W [i] = ℓi 1 2 4 5 5 5 5 5 6 6
Fig. 3. Tracing Algorithm 2 for the input W = ⟨20, 17, 6, 3, 2, 2, 2, 1, 1, 1⟩. The first row shows the initial state
of the array, with brown elements indicating W [i] = w i . During phase 1, the light blue values indicate internal
node weights before being merged; and yellow values indicate parent pointers of internal nodes after they
have been merged. Pink values generated during phase 2 indicate depths of internal nodes; and the purple
values generated during phase 3 indicate depths of leaves. Grey is used to indicate elements that are unused.
The final set of codeword lengths is T = ⟨1, 2, 4, 5, 5, 5, 5, 5, 6, 6⟩.
so can be assigned to the next slots in W as leaf depths. The number of available slots at the next
depth is then twice the number of internal nodes at the current depth.
At the conclusion of the third phase, each original symbol weight w i in array element W [i] has
been over-written by the corresponding codeword length ℓi of a Huffman code. What is particularly
notable is that a complete working implementation of Algorithm 2 is only a little longer than the
pseudo-code that is shown here. There is no need for trees, pointers, heaps, or dynamic memory,
and it computes quickly in O(n) time.
The presentation in this subsection is derived from the description of Moffat and Katajainen [56];
Section 4.3 briefly summarizes some other techniques for computing minimum-redundancy codes.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
0
33 1 0
33 1
14 19 15 18
0 1 0 1 0 1 0 1
b, 7 7 a, 8 11 a, 8 b, 7 11 7
0 1 0 1 0 1 0 1
f, 3 e, 4 d, 5 c, 6 c, 6 d, 5 e, 4 f, 3
Fig. 4. Two alternative code trees for W = ⟨8, 7, 6, 5, 4, 3⟩ and minimum-redundancy code T = ⟨2, 2, 3, 3, 3, 3⟩.
Leaves are again labeled with an alphabetic symbol identifier and numeric weight; internal nodes with their
weight only. The right-hand tree cannot be generated by Algorithm 1.
Section 3.2, the tightly-structured arrangement of codewords shown in Table 1 is desirable, and if
the definition of a code is as a set of codeword lengths, then the somewhat arbitrary distinction
between “minimum-redundancy” and “Huffman” codes becomes irrelevant. That reasoning is why
we deliberately blend the two concepts here.
2.6 Optimality
To demonstrate that Huffman’s algorithm does indeed result in a minimum-redundancy code, two
steps are required [33]. The first step is to confirm the sibling property [25], which asserts that a
minimum-redundancy code exists in which the two least-weight symbols are siblings, and share a
common parent in the corresponding binary code tree. The second step is to verify that joining
those two symbols into a combined node with weight given by their sum, then constructing a
minimum-redundancy code for the reduced-by-one symbol set, then expanding that symbol again
into its two components, yields a minimum-redundancy code for the original set of symbols. The
inductive principle takes care of the remainder of the proof, because a minimum-redundancy code
for the case n = 2 cannot be anything other than ⟨ℓi ⟩ = ⟨1, 1⟩, that is, the two codewords 0 and 1.
Consider the sibling property. Suppose that T = ⟨ℓi ⟩ is known to be a minimum-redundancy code
for the n-sequence W = ⟨w i ⟩. Since every internal node in the code tree must have two children
(because if it did not, a whole subtree could be promoted to make a cheaper code), there must be at
least two nodes at the greatest depth, L = maxi ℓi . Now suppose, without loss of generality, that
w n−1 and w n−2 are the two lowest weights, possibly equal. If the leaves for both of these symbols
are at depth L, that is, ℓn−2 = ℓn−1 = L, then they can be moved in the code tree via a leaf relabeling
process to make them both children of the same internal node in a way that does not alter the cost
C(·, ·) of the code. This is sufficient to satisfy the sibling property.
On the other hand, if (say) ℓn−1 = L and ℓn−2 < L then there must be a different symbol a such
that ℓa = L, that is, there must be a symbol a at the deepest level of the code tree that is neither
symbol n − 2 nor symbol n − 1. Now consider the code T ′ formed by exchanging the lengths of the
codewords assigned to symbols a and n − 2. The new code may have an altered cost compared to
the old code; if so, the difference is given by
C(W ,T ′) − C(W ,T )
= (w a · ℓn−2 + w n−2 · ℓa ) − (w a · ℓa + w n−2 · ℓn−2 )
= (w a − w n−2 ) · (ℓn−2 − ℓa )
≥ 0,
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
with the final inequality holding because the original code T is minimum-redundancy for W ,
meaning that no other code can have a smaller cost. But w a ≥ w n−2 and ℓn−2 < ℓa , in both cases as
a result of the assumptions made, and hence it can be concluded that w a = w n−2 . That is, symbol
a and symbol n − 2 must have the same weight, and can have their codeword lengths exchanged
without altering the cost of the code. With that established, the sibling property can be confirmed.
Consider the second inductive part of the argument, and suppose that
T1 = ⟨ℓ0, . . . , ℓn−2, ℓn−1 ⟩
is an n-element minimum-redundancy code of cost C(W1,T1 ) for the n weights
W1 = ⟨w 0, . . . , w n−3, w n−2, w n−1 + x⟩ ,
for some set of weights such that w 0 ≥ w 1 ≥ · · · ≥ w n−1 ≥ x > 0. Starting with the n-symbol
feasible code T1 , now form the (n + 1)-symbol feasible code
T2 = ⟨ℓ0, . . . , ℓn−3, ℓn−2, ℓn−1 + 1, ℓn−1 + 1⟩ .
By construction, the extended code T2 has cost C(W2,T2 ) = C(W1,T1 ) + w n−1 + x for the (n + 1)-
sequence
W2 = ⟨w 0, . . . , w n−3, w n−2, w n−1, x⟩ .
Suppose next that T3 is a minimum-redundancy code for W2 . If T2 is not also a minimum-redundancy
code, then
C(W2,T3 ) < C(W2,T2 ) = C(W1,T1 ) + w n−1 + x .
But this leads to a contradiction, because the sibling property requires that there be an internal node
of weight w n−1 + x in the code tree defined by T3 , since w n−1 and x are the two smallest weights in
W2 . And once identified, that internal node could be replaced by a leaf of weight w n−1 + x without
altering any other part of the tree, and hence would give rise to an n-element code T4 of cost
C(W1,T4 ) = C(W2,T2 ) − w n−1 − x < C(W1,T1 ) ,
and that would mean in turn that T1 could not be a minimum-redundancy code for W1 .
In combination, these two arguments demonstrate that the codes developed by Huffman are
indeed minimum-redundancy – and that he fully deserved his subject pass in 1951.
where m = n−1 i=0 w i is the sum of the weights, and where − log2 (w i /m) = log2 (m/w i ) is the entropic
Í
cost (in bits) of one instance of a symbol that is expected to occur with probability given by w i /m.
If m = 1, and the w i ’s are interpreted as symbol probabilities, then the quantity H (W ) has units
of bits per symbol, and represents the expected number of output symbols generated per source
symbol. If the w i ’s are integral occurrence counts, and m is the length of the message that is to be
coded, then H (W ) has units of bits, and represents the minimum possible length of the compressed
message when coded relative to its own statistics.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
Given such a sequence W of n weights, the relative effectiveness loss E(W ,T ) of a feasible code
T = ⟨ℓi ⟩ is the fractional difference between the cost C(W ,T ) of that code and the entropy of W :
C(W ,T ) − H (W )
E(W ,T ) = . (4)
H (W )
A relative effectiveness loss of zero indicates that symbols are being coded in their entropic costs;
values larger than zero indicate that the coder is admitting some degree of compression leakage.
Table 2 shows the result of these calculations for some of the sets of weights that have been used
as examples elsewhere in this article. In many situations, including three of the four illustrated
examples, Huffman codes provide compression effectiveness that is within a few percent of the
entropy-based lower bound. The most egregious exceptions occur when the weight of the most
frequent symbol, w 0 , is large relative to the sum of the remaining weights, that is, when w 0 /m
becomes large. Indeed, it is possible to make the relative effectiveness loss arbitrarily high by having
w 0 /m → 1. For example, the 5-sequence W = ⟨96, 1, 1, 1, 1⟩ has an entropy of H (W ) = 32.2 bits,
a minimum redundancy code T = ⟨1, 3, 3, 3, 3⟩ with cost C(W ,T ) = 108 bits, and hence a relative
effectiveness loss of E(W ,T ) = 235%. While T is certainly “minimum-redundancy”, it is a long
way from being good. Even the W = ⟨99, 99, 99, 1, 1, 1⟩ example suffers from non-trivial loss of
effectiveness. Other coding approaches that have smaller relative effectiveness loss in this kind of
highly skewed situation are discussed in Section 5.
A number of bounds on the effectiveness of Huffman codes have been developed. For example, in
an important followup to Huffman’s work, Gallager [25] shows that for a set of n weights W = ⟨w i ⟩
summing to m = n−1 i=0 w i , and with corresponding minimum-redundancy code T = ⟨ℓi ⟩:
Í
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
55
0 1 2 3 4
a, 20 b,17 c, 6 d, 3 9
0 1 2 3 4
e, 2 f, 2 g, 2 h, 1 2
0 1 2 3 4
i, 1 j, 1
Fig. 5. Radix-5 minimum-redundancy canonical code tree for the weights W = ⟨20, 17, 6, 3, 2, 2, 2, 1, 1, 1⟩.
Three dummy nodes are required as children of the rightmost leaf, to create an extended sequence W ′
containing n ′ = 13 leaves. Leaves are labeled with an alphabetic letter corresponding to their integer source
symbol identifier.
least-cost groups of r nodes at a time, rather than groups of two. That is, between 0 and r − 2
dummy symbols are appended, each of weight zero, before starting the code construction process.
For example, consider the sequence W = ⟨20, 17, 6, 3, 2, 2, 2, 1, 1, 1⟩ already used as an example
in Figure 3. It has n = 10 weights, so if an r = 5 code is to be generated, then the augmented
sequence W ′ = ⟨20, 17, 6, 3, 2, 2, 2, 1, 1, 1, 0, 0, 0⟩ is formed, with three additional symbols to make
a total size of n ′ = 13. Three combining steps are then sufficient to create the set of codeword
lengths T ′ = ⟨1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3⟩. From these, a canonical code can be constructed over
the channel alphabet {0, 1, 2, 3, 4}, yielding the ten codewords ⟨0, 1, 2, 3, 40, 41, 42, 43, 440, 441⟩,
with three further codewords (442, 443 and 444) nominally assigned to the three added “dummy”
symbols, and hence unused. Figure 5 shows the resultant code tree.
The equivalent of the Kraft inequality (Equation 1) is now given by
n−1
Õ
K(T ) = r −ℓi ≤ 1 , (5)
i=0
with equality only possible if n = n ′, and n is already one greater than a multiple of r − 1. Similarly,
in the definition of entropy in Equation 3, the base of the logarithm changes from 2 to r when
calculating the minimum possible cost in terms of expected r -ary output symbols per input symbol.
One interesting option is to take r = 28 = 256, in which case what is generated is a Huffman
code in which each output symbol is a byte. For large alphabet applications in which even the most
frequent symbol is relatively rare – for example, when the input tokens are indices into a dictionary
of natural language words – the relative effectiveness loss of such a code might be small, and the
ability to focus on whole bytes at decode-time can lead to a distinct throughput advantage [21].
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
the third considers the question of how the code T = ⟨ℓi ⟩ can be economically communicated from
encoder to decoder. Throughout this section it is assumed that the code being applied is a canonical
one in which the set of codewords is lexicographically sorted.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
indicated by the “*” annotations. With a t-bit partial decode table search_start[] available, and again
presuming a suitable sentinel value in first_code_l[L + 1], the first “identify ℓ” step then becomes:
set ℓ ← search_start[buffer >> (L − t)];
while buffer ≥ first_code_l[ℓ + 1] do
set ℓ ← ℓ + 1.
Because the short codewords are by definition the frequently-used ones, a surprisingly high fraction
of the coded symbols can be handled “exactly” using a truncated lookup table. In the example, even
when a t = 2 table of just four values is used, (20 + 17 + 6)/55 = 78% of the symbol occurrences
get their correct length assigned via the table, and the average number of times the loop guard
is tested when t = 2 is just 1.25. That is, even a quite small truncated decoding table can allow
canonical Huffman codes to be decoded using near-constant time per output symbol. Note also
that neither putbits() nor getbits() should be implemented using loops over bits. Both operations
can be achieved through the use of constant-time masks and shifts guarded by if-statements, with
the underlying writing and reading (respectively) steps based on 32- or 64-bit integers.
If relatively large tracts of memory space can be employed, or if L and n can both be restricted to
relatively small values, decoding can be completely-table based, an observation made by a number
of authors [2, 10–13, 28, 29, 34, 37, 50, 64, 71, 75, 77, 83]. The idea common to all of these approaches
is that each of the n − 1 internal nodes of an explicit code tree can be regarded as being a state in
a decoding automaton, with each such state corresponding to a recent history of unresolved bits.
For example, the right-most internal node in the right-hand tree in Figure 4 (labeled with a weight
of “7”) represents the condition in which “11” has been observed, but not yet “consumed”. If the
next k bits from the input stream are then processed as a single entity – where k = 8 might be a
convenient choice, for example – they drive the automaton to a new state, and might also give rise
to the output of source symbols. Starting at that internal node labeled “7” in the right-hand tree
in Figure 4, the k = 8 input block “01010101” would lead to the output of four symbols, “e, d, b, b”
(that is, original symbols “4, 3, 1, 1”), and would leave the automaton at the internal node labeled
with a weight of “33”, the root of the tree. Each other k-bit input block would give rise to a different
set of transitions and/or outputs.
Across the n − 1 internal nodes the complete set of (n − 1)2k transitions can be computed in
advance, and stored in a two dimensional array, with (at most) k symbols to be emitted as each k-bit
input block is processed. Decoding is then simply a matter of starting in the state corresponding
to the code tree root, and then repeatedly taking k-bit units from the input, accessing the table,
writing the corresponding list of source symbols (possibly none) associated with the transition,
and then shifting to the next state indicated in the table.
Each element in the decoding table consists of a destination state, a count of output symbols (an
integer between 0 and k), and a list of up to k output symbols; and the table requires 2k (n − 1) such
elements. Hence, even quite moderate values of n and k require non-trivial amounts of memory.
For example, n = 212 and k = 8 gives rise to a table containing (8 + 2) × 256 × (212 − 1) values, and
if each is stored as (say) a 16-bit integer, will consume in total around 20 MiB. While not a huge
amount of memory, it is certainly enough that cache misses might have a marked impact on actual
execution throughput. To reduce the memory cost, yet still capture some of the benefits of working
with k-bit units, hybrid methods that blend the “search for ℓ” technique described earlier in this
section with the automaton-based approach are also possible [44].
3.3 Housekeeping
Unless the minimum-redundancy code that will be employed in some application is developed in
advance and then compiled into the encoder and decoder programs (as was the case, for example,
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
for some facsimile coding standards in the early 1980s), a prelude component must also be associated
with each message transmitted, describing the details of the code that will be used for the body
of the message. Elements required in the prelude include the length m of the message, so that the
decoder knows when to stop decoding; a description of the size n and composition of the source
alphabet, if it cannot be assumed by default or is only a subset of some larger universe; and the
length ℓi of the codeword associated with each of the n source symbols that appears in the message.
The two scalars, m and n, have little cost. But if the source alphabet is a subset of a larger universe,
then a subalphabet selection vector is required, and might be a rather larger overhead. For example,
if the source universe is regarded as being the set of all 32-bit integers, then any particular message
will contain a much smaller number of distinct symbols, with, typically, n < m ≪ 232 . The simplest
selection approach is to provide a list of n four-byte integers, listing the symbols that appear in the
current message. But it is also possible to regard the subalphabet as being defined by a bitvector of
length 232 , where a “1” bit at position u indicates that symbol u is part of the subalphabet and has a
code assigned to it. Standard representations for sparse bitvectors (including for cases where it is
dense in localized zones, which is also typical) can then be used, reducing the storage cost. Moffat
and Turpin [63, Chapter 3] describe several suitable methods.
The last component of the prelude is a set of n codeword lengths, T = ⟨ℓi ⟩. These are more
economical to transmit than the weights W = ⟨w i ⟩ from which they are derived, and also more
economical than the actual codewords that will be assigned, since the codeword lengths are integers
over a relatively compact range, ℓmin . . . L. The codeword lengths should be provided in subalphabet
order as a sequence of integer values, perhaps using ⌈log2 (L − ℓmin + 1)⌉ bits each, or perhaps using
– wait for it – a secondary minimum-redundancy code in which n ′ = L − ℓmin + 1.
Once the decoder knows the subalphabet and the length of each of the codewords, it generates
the canonical symbol ordering, sorting by non-decreasing codeword length, and breaking ties
by symbol identifier, so that its canonical code assignment matches the one that gets formed by
the encoder. That is, the symbol weights must be ignored during the canonical reordering, since
they are never known to the decoder. Encoding (and later on, when required, decoding) using the
techniques described earlier in this section can then be commenced.
If the message is very long, or is of unknown length, it can be broken into large fixed-length
blocks, and a prelude constructed for each block. The cost of multiple preludes must then be
accepted, but the use of locally-fitted codes and possibly different subalphabets within each block,
means that overall compression might actually be better than if a single global whole-of-message
code was developed, and a single prelude constructed.
Turpin and Moffat [78] provide a detailed description of preludes, and the processes employed
in encoder and decoder that allow them to remain synchronized.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
Hu and Tan [32] and Van Voorhis [80] provided early algorithms that solved this problem; the
one we focus on here is the more efficient package-merge paradigm proposed in 1990 by Larmore
and Hirschberg [41]. The key idea is that of building packages of symbols, in much the same way
as Huffman’s algorithm does, but taking care that no item can take part in more than L combining
steps. That restriction is enforced by creating all least-cost subtrees of different maximum depths,
and retaining information for each of the possible node depths in a separate structure. Algorithm 3
provides pseudo-code for this approach.
The first list of packages – referred to as packages[1] in Algorithm 3 – is taken directly from the
set of original symbols and their corresponding weights, W []. These are the only subtrees that are
possible if their depth is limited to one. Another way of interpreting this list is that it represents
all of the different ways in which a reduction in the Kraft sum K(·) of 2−L can be achieved, by
“demoting” a subtree (node) from having its root at depth L − 1 to having its root at depth L.
A list of subtrees of depth up to two is then generated, representing ways in which the Kraft
sum might be decreased by 2−L+1 , corresponding to moving a subtree or element from depth L − 2
to depth L − 1. To do this, the items in packages[1] are formed into pairs, starting with the two
smallest ones, and working up to the largest ones. That process generates ⌊n/2⌋ packages, each
with a weight computed as the sum of the two item weights, representing a possible internal node
in a code tree. That list of packages is extended by merging it with another copy of W [], generating
the full list packages[2] of n + ⌊n/2⌋ subtrees of depths up to two, again ordered by cost.
Figure 6 illustrates the computation involved. Items in brown are drawn from W [], and all n = 10
of them appear in every one of the packages[] rows. Interspersed are the composite nodes, marked
in blue, each constructed by pairing two elements from the row above. In the diagram the pairs
are indicated by underbars, and some (not all) are also traced by arrows to show their locations in
the next row. Because an L = 5 code is being sought, the fifth row marks the end of the packaging
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
packages[1] 20 17 6 3 2 2 2 1 1 1
packages[2] 37 20 17 9 6 4 3 3 2 2 2 2 1 1 1
packages[3] 37 20 17 15 7 6 5 4 3 3 2 2 2 2 1 1 1
packages[4] 37 22 20 17 11 7 6 5 4 3 3 2 2 2 2 1 1 1
packages[5] 59 37 20 18 17 11 7 6 5 4 3 3 2 2 2 2 1 1 1
Fig. 6. Package-merge algorithm applied to the weights W = ⟨20, 17, 6, 3, 2, 2, 2, 1, 1, 1⟩ with L = 5. Packages
are shown in blue, and leaf nodes in brown. The shaded region shows the five solution[] sets, starting with the
least-cost 2n − 2 items in the fifth row, and then enclosing the necessary packets to construct it in the rows
above, each as a subset of the corresponding packages[] set. The final code is T = ⟨2, 2, 3, 4, 4, 4, 4, 4, 5, 5⟩ with
cost 142, two greater than the minimum-redundancy code ⟨1, 2, 4, 5, 5, 5, 5, 5, 6, 6⟩. If the final 2n−2 elements in
the fourth row are expanded, an L = 4 length-limited minimum-redundancy code T = ⟨2, 2, 4, 4, 4, 4, 4, 4, 4, 4⟩
is identified, with cost 146. It is not possible to form an L = 3 code for n = 10 symbols.
process – none of the subtrees represented in that row can have a depth that is greater than five.
That fifth row is a collection of elements and subtrees that when demoted from a nominal level of
zero in the tree – that is, occurring at the level of the root of the desired code tree – to a nominal
level of one, each decrease the Kraft sum by 2−L+(L−1) = 0.5.
If every element (leaf or subtree) is assumed to be initially at level zero, then the Kraft sum K(·)
has the value n−1
Í −0 = n, and exceeds the limit of 1 that indicates a feasible code. Indeed, to
i=0 2
arrive at a feasible code, the Kraft sum must be reduced by n − 1 compared to this nominal starting
situation. Therefore, selecting the least-weight 2n − 2 of the items in packages[L] forms a set of
items – called solution[L] in Algorithm 3 – that when each is demoted by one level, leads to the
required code. But to achieve those demotions, each of the composite items in solution[L] need to
have their demotions propagated to the next level down, to form the set solution[L − 1]. In turn,
the composite items in solution[L − 1] drive further demotions in solution[L − 2], continuing until
solution[1] is determined, which, by construction, contains no packages. The process of identifying
the L solution[] sets is described by the loop at steps 12 to 14 in Algorithm 3.
The shaded zone in Figure 6 shows the complete collection of solution[] sets, one per level, that
collectively include all of the required leaf demotions. For example, symbol 0 with weight w 0 = 20
appears twice across the five solution[] sets, and hence needs to be demoted twice from its nominal
ℓ0 = 0 starting point, meaning that it is assigned ℓ0 = 2. At the other end of the alphabet, symbol 9
with weight w 9 = 1 appears in all five solution[] sets, and is assigned a codeword length of ℓ9 = 5.
Steps 16 to 18 in Algorithm 3 check the L solution sets, counting the number of times each original
symbol needs to be demoted, and in doing so, computing the required code T = ⟨ℓi ⟩. The complete
code generated by this process is T = ⟨2, 2, 3, 4, 4, 4, 4, 4, 5, 5⟩, and has a cost of 142 bits, two more
than the Huffman code’s 140 bits.
The package-merge implementation described in Algorithm 3 and illustrated in Figure 6 corre-
sponds to the description of Huffman’s algorithm that is provided in Algorithm 1 – it captures the
overall paradigm that solves the problem, but leaves plenty of room for implementation details to be
addressed. When implemented as described, it is clear that O(nL) time and space might be required.
In their original presentation, Larmore and Hirschberg [41] also provided a more complex version
that reduced the space requirement to O(n), but added (by a constant factor) to the execution time;
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
and a range of other researchers have also described package-merge implementations. For example,
Turpin and Moffat [76] noted that it is more efficient to compute the complement of the solution[],
and present a reverse package merge that requires O(n(L − log2 n + 1)) time; Katajainen et al. [36]
describe an implementation that executes in O(nL) time and requires only O(L2 ) space; and Liddell
and Moffat [45] describe an implementation that starts with a Huffman code and then optimally
rearranges it without computing all the packages, executing in time O(n(L Huff − L + 1)) time, where
L Huff is the length of the Huffman code for the same set of input weights.
Other authors have considered approximate solutions that limit the codeword lengths and in
practice appear to give codes of cost close to or equal to the minimum cost [23, 27, 43, 52]. Milidiú
and Laber [49] analyze the relative effectiveness of length-limited codes, and show that for all but
extremely constrained situations the compression loss relative to a Huffman code is small.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
coding is of only limited practical use. There are no general-purpose compression systems that
make use of dynamic Huffman coding, hence our relatively brief treatment of them here.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
formation process. When the number of alternations in W , denoted α(W ), is high, the resultant
tree has many distinct levels and the instance cost is high; when α(W ) is low the tree is likely to be
relatively shallow and easier to construct. Based on this approach, Barbay describes a mechanism
for computing a Huffman code from a non-sorted input sequence W of n symbol weights in
O(n(1 + log α(W ))) time; Barbay also demonstrates that when there are α(W ) alternations, it is
not possible to compute a minimum-redundancy code in less than this much time, and hence that
relative to the alternations measure, his approach is optimally adaptive. Note also that L diff ≤ α(W )
for all sequences W , and that it may thus be possible to further refine Barbay’s approach, and
develop an adaptive algorithm that requires only O(n(1 + log L diff )) time.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
Algorithm 4 Computing the state transitions required by arithmetic coding [57, 85].
1: function arith_encode (lower, range, s)
2: // Compute the arithmetic coding state change, assuming that
base[s] = s−1
i=0 W [i] for 0 ≤ s ≤ n has been precomputed, with base[n] = m
Í
3: //
4: set scale ← range/m
5: set next_lower ← lower + scale · base[s]
6: set next_range ← scale · W [s]
7: return ⟨next_lower, next_range⟩
in W . The first part of Algorithm 4 provides details of this process, making use of a pre-computed
array base[s] that stores the cumulative weight of all symbols prior to s in the source alphabet. The
half-open interval [base[s]/m, base[s + 1]/m) is the fraction assigned to s in the unit interval [0, 1)
(recall that m is the sum of the symbol frequency counts, that is, the length of the sequence being
compressed), and that means that each interval narrowing step results in
next_range = range · (w s /m) .
Once all of the message symbols have been processed, a shortest-possible unambiguous binary
value that lies fully within the final interval is identified, and transmitted to the decoder. As already
noted, prior to any symbols being coded, range = 1. Hence, once the sequence of m symbols making
up the message have all been processed, and assuming that exact arbitrary-precision arithmetic
is available, the final interval [lower, lower + range) completely captures the message, and can be
represented by any single number value that satisfies lower < value < lower + range. Such a value
cannot require more than (− log2 range) + 2 Í bits to describe, meaning that the m symbols in the
source message can be coded in at most 2 + s log2 (m/w s ) bits, that is, in at most two more bits
than the summed entropic cost of the symbols making up the message.
To decode the message embodied in value, the decoder uses base[] to determine a symbol identifier
s whose probability assignment in the range [0, 1) straddles (in proportion) the relationship that
value has to the current interval [lower, lower + range). The second part of Algorithm 4 describes
this computation, with lower and range being initialized to 0 and 1 again, and then faithfully tracing
the same sequence of values in the decoder as they did in the encoder, with the interval narrowing
around the message value, which remains constant.
Table 5 gives an example of this exact form of arithmetic coding, rendering an eleven symbol
message “0, 0, 1, 0, 2, 0, 1, 5, 0, 3, 1” into a 24-bit compressed message. Relative to the probabilities
established by W , the entropic cost of this message would be 22.95 bits. The Huffman code de-
picted in Figure 2 also requires 23 bits. In this short example the ability of arithmetic coding
to faithfully match the entropic costs does not show through particularly well. But in more ex-
treme cases, the advantage becomes clear. For the weights W = ⟨99, 1⟩, the 24-symbol message
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
in return for faster execution, but these still do not offer the decoding speed that is possible with
canonical Huffman decoding.
When carrying out arithmetic encoding based on fixed symbol probabilities, the required calcu-
lations are all based on values that can be pre-computed and stored in n-element arrays. But when
decoding, even with static probabilities, the set of cumulative weights base[] must be searched, to
identify the symbol s for which base[s] ≤ target < base[s + 1]. One option is to spend O(log n) time
per symbol to binary-search the array base[]; another is to allocate a further array of m elements,
symbol[], to map directly from the set of possible target numbers to the corresponding symbol
identifiers. It is also possible to carry out the search step using n-element arrays in O(1 + log b)
time, where b is the number of bits associated with the code for the symbol in question [55]. Moffat
and Turpin [63, Chapter 5] describe these structures, and a range of other details required in a full
arithmetic coding implementation.
Semi-static arithmetic coding requires that the set of n symbol weights W = ⟨w i ⟩ be commu-
nicated to the decoder via a prelude, a more onerous requirement than the codeword lengths
associated with Huffman coding, because of the increased precision in the numbers involved. That
increased precision is what allows more precise probability estimates Í to be employed, and is what
takes arithmetic coding closer to the entropic information bound log2 (m/w s ), usually, but not
always, recouping the additional prelude cost.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
then likewise allocated in a second cycle; then the integers from 2m + 1 to 3m; and so on. The
second cycle of values (incomplete for symbol 0) is picked out in blue in Table 6, with the vertical
bars indicating the end of each cycle.
Now consider the sample message “0,0,1,0,2,0,1,5,0,3,1’. The final value of state will be
A[A[A[A[A[A[A[A[A[A[A[0, 0], 0], 1], 0], 2], 0], 1], 5], 0], 3], 1] ,
of which the first four steps can be traced in Table 6, and with the remainder easily calculated
(because of the regular nature of the table, see Algorithm 5). The total sequence of states associated
with this example is
0 → 1 → 2 → 13 → 25 → 270 → 568 → 1989 → 41,790 → 87,760 → 1,842,979 → 6,450,435 ,
and the final value – which is a 23-bit binary integer – completely encapsulates the 11 symbols in
the input message. As already noted, the minimum-redundancy codes shown in Figure 2 would
also require 23 bits for this sample message.
Decoding works in reverse: the final value for state is located in the table A[·, ·], with the row num-
ber indicating the symbol to be stacked (symbols are determined in reverse order during decoding),
and the column number indicating the prev_state to revert to. For example, the integer 6,450,434
(one less than the example given above) gives rise to the output sequence “1, 0, 2, 0, 2, 1, 0, 0, 0, 2, 1”;
and the integer 6,450,436 (one larger) leads to the decoded sequence “0, 0, 1, 0, 2, 0, 1, 5, 0, 4, 1”.
What is quite remarkable in this technique is that – despite the seeming unpredictability of the
values involved – it provides a deterministic one-to-one mapping from strings to integers in which
the final state value associated with each string is very close to being the reciprocal of the products
of the probabilities of the symbols comprising the string. This happens because the sequence of
state values at each step increases by the required ratio. For example, in the sequence of states
shown above as being associated with the string “0, 0, 1, 0, 2, 0, 1, 5, 0, 3, 1”, the final three ratios
are 87760/41790 = 2.100 = 21/10 (symbol 0); 1842979/87760 = 21.000 = 21/1 (symbol 3); and
6450435/1842979 = 3.500 = 21/6 (symbol 1). Hence, it is unsurprising that ⌈log2 state⌉ is close to
the entropy cost.
Like arithmetic coding, ANS can obtain compression effectiveness of less than one bit per symbol.
For W = ⟨99, 1⟩ and the message “0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0” that was
used as an earlier example, the range ANS approach described in Algorithm 5 generates a final
state of 730, and hence can represent the 24 symbols in 10 bits.
Use of an infinite table is, of course, impossible. But the highly regular nature of the cycles means
that simple computations suffice to calculate the encoding and decoding transformation, given
only an n-element array of integral symbol weights W [], and a pre-computed cumulative sum
array base[] of the same size. Algorithm 5 provides details, with the encoder mapping returning
a next_state; and the inverse decoder mapping returning a tuple containing a prev_state and the
corresponding decoded symbol s.
In the decoder, one additional array is assumed, the m-element inverse of base[]. In this array,
symbol[r ] is the source symbol associated with the r th offset in each of the ANS cycles. In the
example, the first 10 elements of symbol[], from subscript 1 to subscript 10, would indicate “0”;
then the next 6 elements would indicate “1”, and so on; with symbol[21], at the end of the cycle,
storing “5”. If space is at a premium the symbol[] array can be replaced by a linear or binary search
in base[], as was already discussed in connection with arithmetic coding.
A very important factor of ANS that allows fast decoding is the ability to adjust m, by scaling the
counts W = ⟨w i ⟩ to new values W ′ = ⟨w i′⟩ so that their sum m ′ is a power of two. If that is done,
then the inverse mapping function’s key computations at steps 10 and 11 can be implemented as
shift/mask operations. Slight compression effectiveness loss may be introduced as a result of the
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
Algorithm 5 Computing the mappings A[·, ·] and A−1 [·] for range ANS coding [18].
1: function ans_encode (state, s)
2: // Compute the ANS forward mapping, assuming that
base[s] = 1 + s−1
i=0 W [i] for 0 ≤ s ≤ n has been precomputed, with base[n] = m + 1
Í
3: //
4: set f ← state div W [s]
5: set r ← state mod W [s]
6: set next_state ← f · m + base[s] + r
7: return next_state
adjusted weights, but the overhead is likely to be very small, provided m ′, the new value, is several
multiples larger than n, the size of the source alphabet. For example, a character-based coder for
the n = 256 byte values might be designed so that the observed character frequencies are scaled to
yield m ′ = 4096 or m ′ = 65536. The latter will allow more precise probability estimates, but also
involve a larger symbol[] array, potentially slowing decoding throughput because of cache misses.
In the special case in which m ′ is an integer power of two, and all of the W [i] values that sum to m ′
are also rounded to integer powers of two, then each coding operation adds an integral number of
bits of precision to the value of state. That is, a Huffman code can be thought of as being a special
case of an ANS code.
Omitted from the description in Algorithm 5 is the mechanism used to periodically renormalize
state, necessary in an implementation so that integer overflow does not arise. Renormalization
occurs in the encoder by removing some number k of bits from the low-order end of state and
writing them to the output stream, and shifting the remaining bits to the right, where k = 8 or
k = 16 might be attractive units. In the decoder, renormalization shifts state to the left by k bits, and
brings in the next k bits from the coded message. Careful attention to detail is required to ensure
that encoder and decoder remain in step, and that the sequence of states traversed backwards
by the decoder is exactly faithful to the sequence traversed forwards by the encoder. As part of
this mechanism, state is maintained within a range determined by m, the sum of the occurrence
counts, and by r = 2k , the radix of the code. One way of doing this is to require that (after an initial
ramp-up stage during which the first of the two inequalities is not enforced) C ·m ≤ state < C ·m · r
after every encoding step, and prior to every decoding step, for some integer multiplier C ≥ 1.
Bounding the range of state in the encoder is achieved by anticipating what will happen at each
upcoming encoding step, and if the putative next_state is greater than the upper bound on the range,
reducing state ahead of the encoding operation. That is, prior to step 6 in Algorithm 5, intervention
may be necessary to ensure that the upper and lower bounds on state are complied with when that
step does get performed. In the decoder, the corresponding update takes place following a decoding
operation at step 13, and is indicated by the computed prev_state falling below the lower bound on
the range (except during the wind-down phase at the end of the message as the first few symbols
to have been encoded are regenerated).
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
The constant C provides a tradeoff between the fidelity of the arithmetic to the entropic cost, and
the number of bits used in the computation, and hence the size of the arrays used in a table-based
implementation. When C = 1, compression effectiveness is compromised by round-off issues as
state gets scaled, but is likely to still be better than a Huffman code; compression close to the
entropic cost occurs when C is larger.
There is a clear difference between ANS renormalization and arithmetic coding renormalization
– in the case of arithmetic coding, the encoder renormalization process takes bits (or bytes) from the
most-significant end of the state variables lower and range, whereas in ANS they are taken from
the least-significant end of state. Nevertheless, there are also many aspects of the renormalization
process that are shared between the two methods.
Because the decoder of necessity must consume code digits in reverse order to their generation
by the encoder, and regenerates the output string from right to left, it is usual in an ANS coder for
the input sequence to be reversed at the encoder, then the encoding to be performed with the output
digits stored into a buffer, and finally the buffer to be written as the coded message, also in reverse
order. Applying both of the “reversing” steps within the encoder allows the decoder to consume the
code digits (starting with the encoder’s final state) sequentially, at the same time writing alphabet
symbols directly to the output to regenerate the original input, thereby minimizing the cost of the
decoding process.
Like arithmetic coding, ANS achieves its best compression if it has precise symbol counts available,
in which case the prelude cost for ANS is the same as the prelude cost for an arithmetic coder,
and greater than the prelude cost for a Huffman coder. If “approximated” power-of-two frequency
counts as represented by a code T = ⟨ℓi ⟩ are used with an ANS coder, then compression equal to
that of a Huffman coder will result.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
integer when executed on commodity hardware; “moderately fast” as meaning 10–50 nanoseconds
per decoded integer; and “slow” as meaning 100 or more nanoseconds per integer.
Considering the table’s rows in turn, the big weakness of Huffman coding is its inability to
approach the entropic cost when the probability distribution is dominated by the most common
symbol. In terms of speed, arithmetic coding requires more computation than either Huffman
or ANS coding, and if implemented adaptively (one of its key application areas), speed further
drops because of the updates that take place to the statistics data structure. The question of ANS
decoding speed relative to Huffman decoding speed is one that has received detailed exploration
by practitioners; see, for example, the results collected by Yann Collet4 , who suggests that Huffman
decoding is still faster than ANS decoding. Indeed, since Huffman coding can be regarded as being
a special case of ANS coding, any implementation technique or performance gain made in regard to
ANS will likely be transferable to Huffman coding. Moreover, the techniques described in Section 3.2
mean that minimum-redundancy coding can be very fast indeed, particularly if a length-limit is
applied so as to control the size of the decoding tables. Huffman coding is used in the well-known
ZLib library, for example5 ; and in the bzip2 general purpose compressor6 .
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
On the other hand, arithmetic decoding is not as fast. In particular, in the decoder (Algorithm 4),
two state variables are maintained, and only one of the two divisions (the one by m) is “controllable”,
in the sense of it being possible in a semi-static coder to adjust the symbol frequencies so that m is
a fixed power of two. The second division cannot be dealt with in this way. In the ANS decoder
(Algorithm 5), only a single state variable is maintained, and the division that is required at each
decoding step is controllable, and can always be implemented using a single shift/mask.
The ANS approach cannot be used where adaptive probability estimates and multi-context
models are in operation, because the symbols are regenerated by the decoder in the reverse order
that they are processed by the encoder. The high cost of dynamic Huffman coding thus means that
complex multi-context models must still be coupled with arithmetic coding.
Table-based implementations of ANS that are similar to the table-based Huffman approaches
result in further speed gains. Because of its near-entropy compression effectiveness and its fast
decoding speed, ANS coding has been incorporated in a range of general-purpose compression
tools in which block-based operation means that static codes are appropriate, and that the two
reversing steps can be accommodated. These include the ZStd library7 and software by Apple
and Google. Interest in ANS has been in part sparked by several blogs including those of Charles
Bloom8 , Yann Collet9 , and Franz Giesen10 . Applications that have incorporated ANS include image
compression [19]; compressed indexes for document retrieval systems [58, 59]; and compression of
time-series data [7].
To obtain their compression advantage, arithmetic and ANS coders require more detailed prelude
statistics, which adds a small overhead and might discourage their use for short messages. Note,
however, that the implicit approximation of symbol occurrence counts that is an inevitable part
of a minimum-redundancy coding prelude (Section 3.3) could also be explicitly employed with
the arithmetic and ANS coding mechanisms, and so the “short messages” differential noted by
Bookstein and Klein [8] is not a fundamental one in any way.
Taking all of the facets listed in Table 7 into account, it is clear that arithmetic coding remains
the preferred choice for adaptive and multi-context modeling situations, but also that ANS is an
important new technique that should be used in preference to minimum-redundancy (Huffman)
coding in many of the latter’s traditional application areas. However it is also clear that when
comparing ANS and minimum-redundancy coding, a trade-off between decoding speed and com-
pression effectiveness still exists, and if the symbol probability distribution is not skewed, then
canonical minimum-redundancy decoding continues to be the appropriate choice.
More than sixty years since it was first invented, Huffman’s famous algorithm is no longer the
irresistible force that it once was. But even so, Huffman coding remains alive and well, and plays
an important role in practical data compression systems.
Software
An implementation of minimum-redundancy coding that includes many of the techniques de-
scribed in this paper and operates on general integer sequences is available at [Link]
turpinandrew/shuff. Yann Collet’s highly-tuned implementation of table-based ANS (called Finite
State Entropy coding, or FSE) is available at [Link] and
is accompanied by a corresponding carefully-engineered implementation of Huffman coding, both
7 [Link]
8 For example, [Link] accessed 6 September 2018.
9 For example, [Link] accessed 6 September 2018.
10 For example, [Link] accessed 6 September 2018; and software at https:
//[Link]/Cyan4973/FiniteStateEntropy.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
operating over character-based input rather than general integer input. An implementation of arith-
metic coding that is relatively untuned is available at [Link]
arith_coder; it also has a mode that processes general integer sequences.
ACKNOWLEDGMENTS
Multiple co-authors have contributed to the body of work that is summarized here, including (and
not limited to) Andrew Turpin, Matthias Petri, Mike Liddell, Justin Zobel, and Jyrki Katajainen.
Jérémy Barbay provided helpful input in connection with Section 4.3; and the referees provided
detailed comments that have led to a range of useful improvements in the presentation.
REFERENCES
[1] J. Barbay. 2016. Optimal prefix free codes with partial sorting. In Proc. Ann. Symp. Combinatorial Pattern Matching.
LIPIcs Series, Leibniz-Zentrum fuer Informatik, Tel Aviv, Israel, 29:1–29:13.
[2] M. A. Bassiouni and A. Mukherjee. 1995. Efficient decoding of compressed data. Journal of the American Society for
Information Science 46, 1 (1995), 1–8.
[3] A. A. Belal and A. Elmasry. 2006. Distribution-sensitive construction of minimum-redundancy prefix codes. In Proc.
Annual Symp. Theoretical Aspects of Computer Science, Vol. 3884. Lecture Notes in Computer Science, Marseille, France,
92–103.
[4] A. A. Belal and A. Elmasry. 2016. Optimal prefix codes with fewer distinct codeword lengths are faster to construct.
CoRR abs/cs/0509015 (2016), 23.
[5] T. C. Bell, J. G. Cleary, and I. H. Witten. 1990. Text Compression. Prentice-Hall, Englewood Cliffs, NJ.
[6] T. C. Bell, I. H. Witten, and J. G. Cleary. 1989. Modeling for text compression. Computing Surveys 21, 4 (1989), 557–591.
[7] D. Blalock, S. Madden, and J. Guttag. 2018. Sprintz: Time series compression for the internet of things. ACM Interactive,
Mobile, and Wearable Ubiquitous Technololgy 2, 3 (2018), 93.1–93.23.
[8] A. Bookstein and S. T. Klein. 1993. Is Huffman coding dead? Computing 50, 4 (1993), 279–296.
[9] R. M. Capocelli and A. De Santis. 1990. Minimum codeword length and redundancy of Huffman codes. In Proc. Int.
Symp. Coding Theory and Applications, Vol. 514. Lecture Notes in Computer Science, Udine, Italy, 309–317.
[10] H.-C. Chen, Y.-L. Wang, and Y.-F. Lan. 1999. A memory-efficient and fast Huffman decoding algorithm. Inform. Process.
Lett. 69, 3 (1999), 119–122.
[11] Y. Choueka, S. T. Klein, and Y. Perl. 1985. Efficient variants of Huffman codes in high level languages. In Proc. Annual
Int. ACM SIGIR Conf. Research and Development in Information Retrieval. ACM Press, Montreal, Canada, 122–130.
[12] K.-L. Chung and J.-G. Wu. 2001. Faster implementation of canonical minimum redundancy prefix codes. Journal of
Information Science and Engineering 17, 2 (2001), 341–345.
[13] K.-L. Chung. 1997. Efficient Huffman decoding. Inform. Process. Lett. 61, 2 (1997), 97–99.
[14] J. G. Cleary and I. H. Witten. 1984. Data compression using adaptive coding and partial string matching. IEEE
Transactions on Communications 32, 4 (1984), 396–402.
[15] J. B. Connell. 1973. A Huffman-Shannon-Fano code. Proc. IEEE 61, 7 (1973), 1046–1047.
[16] G. V. Cormack and R. N. Horspool. 1984. Algorithms for adaptive Huffman codes. Inform. Process. Lett. 18, 3 (1984),
159–165.
[17] J. Duda. 2009. Asymmetric numeral systems. CoRR abs/0902.0271 (2009), 1–47. [Link]
[18] J. Duda. 2013. Asymmetric numeral systems: Entropy coding combining speed of Huffman coding with compression
rate of arithmetic coding. CoRR abs/1311.2540 (2013), 1–24. [Link]
[19] J. Duda, K. Tahboub, N. J. Gadgil, and E. J. Delp. 2015. The use of asymmetric numeral systems as an accurate
replacement for Huffman coding. In Proc. Picture Coding Symp. IEEE, Cairns, Australia, 65–69.
[20] N. Faller. 1973. An adaptive system for data compression. In Record of the 7th Asilomar Conf. on Circuits, Systems, and
Computers. IEEE, Pacific Grove, California, USA, 593–597.
[21] A. Fariña, N. R. Brisaboa, G. Navarro, F. Claude, Á. S. Places, and E. Rodríguez. 2012. Word-based self-indexes for
natural language text. ACM Transactions on Information Systems 30, 1 (2012), 1:1–1:34.
[22] P. M. Fenwick. 2012. PPM compression without escapes. Software – Practice and Experience 42, 2 (2012), 255–260.
[23] A. S. Fraenkel and S. T. Klein. 1993. Bounding the depth of search trees. Comput. J. 36, 7 (1993), 668–678.
[24] T. Gagie, G. Navarro, Y. Nekrich, and A. Ordóñez Pereira. 2015. Efficient and compact representations of prefix codes.
IEEE Transactions on Information Theory 61, 9 (2015), 4999–5011.
[25] R. G. Gallager. 1978. Variations on a theme by Huffman. IEEE Transactions on Information Theory IT-24, 6 (1978),
668–674.
[26] S. W. Golomb. 1966. Run-length encodings. IEEE Transactions on Information Theory IT–12, 3 (1966), 399–401.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
1 Alistair Moffat
[27] M. Hankamer. 1979. A modified Huffman procedure with reduced memory requirements. IEEE Transactions on
Communications 27, 6 (1979), 930–932.
[28] R. Hashemian. 1995. High speed search and memory efficient Huffman coding. IEEE Transactions on Communications
43, 10 (1995), 2576–2581.
[29] R. Hashemian. 2004. Condensed table of Huffman coding, a new approach to efficient decoding. IEEE Transactions on
Communications 52, 1 (2004), 6–8.
[30] D. S. Hirschberg and D. A. Lelewer. 1990. Efficient decoding of prefix codes. Commun. ACM 33, 4 (1990), 449–459.
[31] P. G. Howard and J. S. Vitter. 1994. Design and analysis of fast text compression based on quasi-arithmetic coding.
Information Processing & Management 30, 6 (1994), 777–790.
[32] T. C. Hu and K. C. Tan. 1972. Path length of binary search trees. SIAM Journal of Applied Mathematics 22, 2 (1972),
225–234.
[33] D. A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. Inst. Radio Engineers 40, 9
(1952), 1098–1101.
[34] V. Iyengar and K. Chakrabarty. 1997. An efficient finite-state machine implementation of Huffman decoders. Inform.
Process. Lett. 64, 6 (1997), 271–275.
[35] J. Kärkkäinen and G. Tischler. 2013. Near in place linear time minimum redundancy coding. In Proc. IEEE Data
Compression Conf. IEEE Computer Society Press, Snowbird, Utah, 411–420.
[36] J. Katajainen, A. Moffat, and A. Turpin. 1995. A fast and space-economical algorithm for length-limited coding. In Int.
Symp. Algorithms and Computation. LNCS 1004, Springer-Verlag, Cairns, Australia, 12–21.
[37] S. T. Klein. 2000. Skeleton trees for the efficient decoding of Huffman encoded texts. Information Retrieval 3, 1 (2000),
7–23.
[38] S. T. Klein and D. Shapira. 2011. Huffman coding with non-sorted frequencies. Mathematics in Computer Science 5, 2
(2011), 171–178.
[39] D. E. Knuth. 1985. Dynamic Huffman coding. Journal of Algorithms 6, 2 (1985), 163–180.
[40] L. G. Kraft. 1949. A device for quantizing, grouping, and coding amplitude modulated pulses. Master’s thesis. MIT,
Cambridge, Massachusetts.
[41] L. L. Larmore and D. S. Hirschberg. 1990. A fast algorithm for optimal length-limited Huffman codes. J. ACM 37, 3
(1990), 464–473.
[42] D. A. Lelewer and D. S. Hirschberg. 1987. Data compression. Computing Surveys 19, 3 (1987), 261–296.
[43] M. Liddell and A. Moffat. 2001. Length-restricted coding in static and dynamic frameworks. In Proc. IEEE Data
Compression Conf. IEEE Computer Society, Snowbird, Utah, 133–142.
[44] M. Liddell and A. Moffat. 2006. Decoding prefix codes. Software – Practice and Experience 36, 15 (2006), 1687–1710.
[45] M. Liddell and A. Moffat. 2007. Incremental calculation of minimum-redundancy length-restricted codes. IEEE
Transactions on Communications 55, 3 (2007), 427–435.
[46] W.-W. Lu and M. P. Gough. 1993. A fast-adaptive Huffman coding algorithm. IEEE Transactions on Communications 41,
4 (1993), 535–538.
[47] D. Manstetten. 1992. Tight upper bounds on the redundancy of Huffman codes. IEEE Transactions on Information
Theory 38, 1 (1992), 144–151.
[48] B. McMillan. 1956. Two inequalities implied by unique decipherability. Inst. Radio Engineers Transactions on Information
Theory IT-2 (1956), 115–116.
[49] R. L. Milidiú and E. S. Laber. 2001. Bounding the inefficiency of length-restricted prefix codes. Algorithmica 31, 4
(2001), 513–529.
[50] R. L. Milidiú, E. S. Laber, L. O. Moreno, and J. C. Duarte. 2003. A fast decoding method for prefix codes. In Proc. IEEE
Data Compression Conf. IEEE Computer Society, Snowbird, Utah, 438.
[51] R. L. Milidiú, E. S. Laber, and A. A. Pessoa. 1999. Bounding the compression loss of the FGK algorithm. Journal of
Algorithms 32, 2 (1999), 195–211.
[52] R. L. Milidiú, A. A. Pessoa, and E. S. Laber. 1998. In-place length-restricted prefix coding. In Proc. Symp. String Processing
and Information Retrieval. IEEE Computer Society, Santa Cruz de la Sierra, Bolivia, 50–59.
[53] R. L. Milidiú, A. A. Pessoa, and E. S. Laber. 2001. Three space-economical algorithms for calculating minimum-
redundancy prefix codes. IEEE Transactions on Information Theory 47, 6 (2001), 2185–2198.
[54] A. Moffat. 1990. Implementing the PPM data compression scheme. IEEE Transactions on Communications 38, 11 (1990),
1917–1921.
[55] A. Moffat. 1999. An improved data structure for cumulative probability tables. Software – Practice and Experience 29, 7
(1999), 647–659.
[56] A. Moffat and J. Katajainen. 1995. In-place calculation of minimum-redundancy codes. In Workshop on Algorithms and
Data Structures. Springer-Verlag, LNCS 955, Queen’s University, Kingston, Ontario, 393–402. Source code available
from [Link]
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Huffman Coding 1
[57] A. Moffat, R. Neal, and I. H. Witten. 1998. Arithmetic coding revisited. ACM Transactions on Information Systems 16, 3
(1998), 256–294. Source code available from [Link]
[58] A. Moffat and M. Petri. 2017. ANS-based index compression. In Proc. Int. Conf. Knowledge and Information Management.
ACM Press, Singapore, 677–686.
[59] A. Moffat and M. Petri. 2018. Index compression using byte-aligned ANS coding and two-dimensional contexts. In
Proc. Int. Conf. Web Search and Data Mining. ACM Press, Marina Del Rey, CA, USA, 405–413.
[60] A. Moffat, N. Sharman, I. H. Witten, and T. C. Bell. 1994. An empirical evaluation of coding methods for multi-symbol
alphabets. Information Processing & Management 30, 6 (1994), 791–804.
[61] A. Moffat and A. Turpin. 1997. On the implementation of minimum-redundancy prefix codes. IEEE Transactions on
Communications 45, 10 (1997), 1200–1207.
[62] A. Moffat and A. Turpin. 1998. Efficient construction of minimum-redundancy codes for large alphabets. IEEE
Transactions on Information Theory 44, 4 (1998), 1650–1657.
[63] A. Moffat and A. Turpin. 2002. Compression and Coding Algorithms. Kluwer Academic Publishers, Boston, MA.
[64] Y. Nekritch. 2000. Byte-oriented decoding of canonical Huffman codes. In Proc. IEEE Int. Symp. Information Theory.
IEEE, Sorrento, Italy, 371.
[65] A. Novoselsky and E. Kagan. 2016. Remark on “Algorithm 673: Dynamic Huffman Coding”. ACM Trans. Math. Software
42, 1 (2016), 10.
[66] O. Petersson and A. Moffat. 1995. A framework for adaptive sorting. Discrete Applied Mathematics 59, 2 (1995), 153–179.
[67] K. Sayood. 2006. Introduction to Data Compression (third ed.). Morgan Kaufmann, San Francisco, CA.
[68] M. Schindler. 1998. A fast renormalisation for arithmetic coding. In Proc. IEEE Data Compression Conf. IEEE Computer
Society Press, Snowbird, Utah, 572.
[69] E. S. Schwartz and B. Kallick. 1964. Generating a canonical prefix encoding. Commun. ACM 7, 3 (1964), 166–169.
[70] C. E. Shannon. 1948. A mathematical theory of communication. Bell Systems Technical Journal 27 (1948), 379–423,
623–656.
[71] A. Siemiński. 1988. Fast decoding of the Huffman codes. Inform. Process. Lett. 26, 5 (1988), 237–241.
[72] P. Skibinski and S. Grabowski. 2004. Variable-length contexts for PPM. In Proc. IEEE Data Compression Conf. IEEE
Computer Society Press, Snowbird, Utah, 409–418.
[73] G. Stix. 1991. Profile: Information theorist David A. Huffman. Scientific American 265, 3 (1991), 54––58. Reproduced at
[Link] accessed 17 August, 2014.
[74] J. A. Storer. 1988. Data Compression: Methods and Theory. Computer Science Press, Rockvill, MD.
[75] H. Tanaka. 1987. Data structure of the Huffman codes and its application to efficient encoding and decoding. IEEE
Transactions on Information Theory IT-33, 1 (1987), 154–156.
[76] A. Turpin and A. Moffat. 1995. Practical length-limited coding for large alphabets. Comput. J. 38, 5 (1995), 339–347.
[77] A. Turpin and A. Moffat. 1998. Comment on “Efficient Huffman decoding” and “An Efficient Finite-State Machine
Implementation of Huffman Decoders”. Inform. Process. Lett. 68, 1 (1998), 1–2.
[78] A. Turpin and A. Moffat. 2000. Housekeeping for prefix coding. IEEE Transactions on Communications 48, 4 (2000),
622–628. Source code available from [Link]
[79] J. van Leeuwen. 1976. On the construction of Huffman trees. In Int. Colloq. Automata, Languages, and Programming.
Edinburgh University Press, Edinburgh University, Scotland, 382–410.
[80] D. C. Van Voorhis. 1974. Constructing codes with bounded codeword lengths. IEEE Transactions on Information Theory
IT-20, 2 (1974), 288–290.
[81] J. S. Vitter. 1987. Design and analysis of dynamic Huffman codes. J. ACM 34, 4 (1987), 825–845.
[82] J. S. Vitter. 1989. Algorithm 673: Dynamic Huffman coding. ACM Trans. Math. Software 15, 2 (1989), 158–167.
[83] P.-C. Wang, Y.-R. Yang, C.-L. Lee, and H.-Y. Chang. 2005. A memory-efficient Huffman decoding algorithm. In Proc. Int.
Conf. Advanced Information Networking and Applications. IEEE Computer Society Press, Taipei, Taiwan, 475–479.
[84] I. H. Witten, A. Moffat, and T. C. Bell. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images
(second ed.). Morgan Kaufmann, San Francisco.
[85] I. H. Witten, R. M. Neal, and J. G. Cleary. 1987. Arithmetic coding for data compression. Commun. ACM 30, 6 (1987),
520–541.
[86] J. Zobel and A. Moffat. 1995. Adding compression to a full-text retrieval system. Software – Practice and Experience 25,
8 (1995), 891–903.
ACM Comput. Surv., Vol. 1, No. 1, Article 1. Publication date: June 2019.
Historical advancements in Huffman coding, such as the development of canonical prefix encoding and hybrid decoding techniques, have significantly contributed to its current efficiency. Innovations like lexicographic ordering, efficient integer manipulation, and advanced data structuring have allowed Huffman coding to achieve rapid, low-memory decoding and encoding processes. These improvements, built on foundational concepts by pioneers like Huffman, have ensured continued relevance and performance superiority in data compression .
A completely table-based decoding approach requires significant memory, as it involves storing all potential state transitions and outputs in a large database. This can lead to increased cache misses when memory handling becomes too large for efficient access, impacting execution throughput negatively. Therefore, managing table size and ensuring efficient cache usage are critical in maintaining high throughput .
Decoding using an automaton treats each internal node of a Huffman tree as a state, which allows for rapid processing of input blocks through predefined transitions and outputs for each state. This structure enhances operational efficiency by converting sequential, bit-by-bit decoding into more advanced, state-driven decoding, thus reducing time complexity and allowing large-scale data handling with better throughput and reduced latency .
The performance of `putbits()` and `getbits()` operations is enhanced by using constant-time masks and shifts, avoiding bit-level loops which are computationally expensive. By manipulating 32- or 64-bit integers, these operations execute efficiently with fewer instructions, allowing for rapid writing and reading. This methodology significantly reduces computational overhead, ensuring that large blocks of data can be processed swiftly .
Lexicographic ordering of codewords in Huffman coding allows for simpler codeword assignment and avoids the complexity of following the tree structure generated by textbook Huffman algorithm implementations. It enables the generation of a set of lexicographically ordered codewords from a non-decreasing feasible code efficiently. Unlike tree-based implementations, this method does not require maintaining the structure or hierarchy of nodes explicitly .
The prelude component in Huffman coding provides essential details about the code that will be used for message decoding, such as the length of the message. This ensures that the decoder knows how to interpret the encoded information correctly. Without this prelude, the decoder would be unable to accurately reconstruct the original message, as it requires the specific details of the code used for the message body .
The linear search process enhances decoding efficiency by using a partial decode table, `search_start[]`, that contains definite codeword lengths or the smallest valid value of codeword lengths, allowing for quick determination of the codeword length. The process reduces the need for looping over bits, thus speeding up the identification of codewords, particularly focusing on frequent short codewords, which are decoded quickly using the table .
To reduce inefficiency and compression loss in length-restricted Huffman coding, the use of adaptive coding algorithms, refined search techniques for optimal codeword assignment, and space-efficient data structures are suggested. These approaches focus on balancing codeword lengths with the need to maintain minimal redundancy by adapting to variations in input data and refining the codebook for more efficient encoding .
Truncated lookup tables in Huffman coding map the first few bits of a buffer to codeword lengths or initial search starting points. This mapping allows the majority of frequent, short codewords to be decoded directly and rapidly, without resorting to linear searching for these frequent occurrences, thus achieving near-constant time decoding for a large fraction of symbols. These tables are especially effective for short codewords, which are used more frequently, speeding up the decoding process significantly .
Memory requirements in Huffman code decoding can be managed by employing hybrid methods that combine the 'search for ℓ' technique with an automaton-based approach. This reduces memory cost while maintaining some benefits of k-bit unit processing. Additionally, smaller k values, efficient data structures, and optimizations such as cache-friendly algorithms can minimize memory usage and cache misses, which become significant when n and k are large .