2020
2121## Example model
2222
23- The ` examples ` directory contains an implementation of the image compression
24- model described in:
23+ The [ examples directory] ( examples/ ) directory contains an implementation of the
24+ image compression model described in:
2525
2626> J. Ballé, V. Laparra, E. P. Simoncelli:
2727> "End-to-end optimized image compression"
@@ -43,228 +43,14 @@ python bls2017.py [options] compress original.png compressed.bin
4343python bls2017.py [options] decompress compressed.bin reconstruction.png
4444```
4545
46- ## Entropy bottleneck layer
46+ ## Documentation
4747
48- This layer exposes a high-level interface to model the entropy (the amount of
49- information conveyed) of the tensor passing through it. During training, this
50- can be use to impose a (soft) entropy constraint on its activations, limiting
51- the amount of information flowing through the layer. Note that this is distinct
52- from other types of bottlenecks, which reduce the dimensionality of the space,
53- for example. Dimensionality reduction does not limit the amount of information,
54- and does not enable efficient data compression per se.
48+ Refer to [ the API documentation] ( docs/api_docs/python/tfc.md ) for a full
49+ description of the Keras layers and TensorFlow ops this package implements.
5550
56- After training, this layer can be used to compress any input tensor to a string,
57- which may be written to a file, and to decompress a file which it previously
58- generated back to a reconstructed tensor (possibly on a different machine having
59- access to the same model checkpoint). For this, it uses the range coder
60- documented in the next section. The entropies estimated during training or
61- evaluation are approximately equal to the average length of the strings in bits.
62-
63- The layer implements a flexible probability density model to estimate entropy,
64- which is described in the appendix of the paper (please cite the paper if you
65- use this code for scientific work):
66-
67- > J. Ballé, D. Minnen, S. Singh, S. J. Hwang, N. Johnston:
68- > "Variational image compression with a scale hyperprior"
69- > https://arxiv.org/abs/1802.01436
70-
71- The layer assumes that the input tensor is at least 2D, with a batch dimension
72- at the beginning and a channel dimension as specified by ` data_format ` . The
73- layer trains an independent probability density model for each channel, but
74- assumes that across all other dimensions, the inputs are i.i.d. (independent and
75- identically distributed). Because the entropy (and hence, average codelength) is
76- a function of the densities, this assumption may have a direct effect on the
77- compression performance.
78-
79- Because data compression always involves discretization, the outputs of the
80- layer are generally only approximations of its inputs. During training,
81- discretization is modeled using additive uniform noise to ensure
82- differentiability. The entropies computed during training are differential
83- entropies. During evaluation, the data is actually quantized, and the
84- entropies are discrete (Shannon entropies). To make sure the approximated
85- tensor values are good enough for practical purposes, the training phase must
86- be used to balance the quality of the approximation with the entropy, by
87- adding an entropy term to the training loss, as in the following example.
88-
89- ### Training
90-
91- Here, we use the entropy bottleneck to compress the latent representation of
92- an autoencoder. The data vectors ` x ` in this case are 4D tensors in
93- ` 'channels_last' ` format (for example, 16x16 pixel grayscale images).
94-
95- Note that ` forward_transform ` and ` backward_transform ` are placeholders and can
96- be any appropriate artifical neural network. We've found that it generally helps
97- * not* to use batch normalization, and to sandwich the bottleneck between two
98- linear transforms or convolutions (i.e. to have no nonlinearities directly
99- before and after).
100-
101- ``` python
102- # Build autoencoder.
103- x = tf.placeholder(tf.float32, shape = [None , 16 , 16 , 1 ])
104- y = forward_transform(x)
105- entropy_bottleneck = EntropyBottleneck()
106- y_, likelihoods = entropy_bottleneck(y, training = True )
107- x_ = backward_transform(y_)
108-
109- # Information content (= predicted codelength) in bits of each batch element
110- # (note that taking the natural logarithm and dividing by `log(2)` is
111- # equivalent to taking base-2 logarithms):
112- bits = tf.reduce_sum(tf.log(likelihoods), axis = (1 , 2 , 3 )) / - np.log(2 )
113-
114- # Squared difference of each batch element:
115- squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis = (1 , 2 , 3 ))
116-
117- # The loss is a weighted sum of mean squared error and entropy (average
118- # information content), where the weight controls the trade-off between
119- # approximation error and entropy.
120- main_loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
121-
122- # Minimize loss and auxiliary loss, and execute update op.
123- main_optimizer = tf.train.AdamOptimizer(learning_rate = 1e-4 )
124- main_step = main_optimizer.minimize(main_loss)
125- # 1e-3 is a good starting point for the learning rate of the auxiliary loss,
126- # assuming Adam is used.
127- aux_optimizer = tf.train.AdamOptimizer(learning_rate = 1e-3 )
128- aux_step = aux_optimizer.minimize(entropy_bottleneck.losses[0 ])
129- step = tf.group(main_step, aux_step, entropy_bottleneck.updates[0 ])
130- ```
131-
132- Note that the layer always produces exactly one auxiliary loss and one update
133- op, which are only significant for compression and decompression. To use the
134- compression feature, the auxiliary loss must be minimized during or after
135- training. After that, the update op must be executed at least once. Here, we
136- simply attach them to the main training step.
137-
138- ### Evaluation
139-
140- ``` python
141- # Build autoencoder.
142- x = tf.placeholder(tf.float32, shape = [None , 16 , 16 , 1 ])
143- y = forward_transform(x)
144- y_, likelihoods = EntropyBottleneck()(y, training = False )
145- x_ = backward_transform(y_)
146-
147- # Information content (= predicted codelength) in bits of each batch element:
148- bits = tf.reduce_sum(tf.log(likelihoods), axis = (1 , 2 , 3 )) / - np.log(2 )
149-
150- # Squared difference of each batch element:
151- squared_error = tf.reduce_sum(tf.squared_difference(x, x_), axis = (1 , 2 , 3 ))
152-
153- # The loss is a weighted sum of mean squared error and entropy (average
154- # information content), where the weight controls the trade-off between
155- # approximation error and entropy.
156- loss = 0.5 * tf.reduce_mean(squared_error) + tf.reduce_mean(bits)
157- ```
158-
159- To be able to compress the bottleneck tensor and decompress it in a different
160- session, or on a different machine, you need three items:
161-
162- - The compressed representations stored as strings.
163- - The shape of the bottleneck for these string representations as a ` Tensor ` ,
164- as well as the number of channels of the bottleneck at graph construction
165- time.
166- - The checkpoint of the trained model that was used for compression. Note:
167- It is crucial that the auxiliary loss produced by this layer is minimized
168- during or after training, and that the update op is run after training and
169- minimization of the auxiliary loss, but * before* the checkpoint is saved.
170-
171- ### Compression
172-
173- ``` python
174- x = tf.placeholder(tf.float32, shape = [None , 16 , 16 , 1 ])
175- y = forward_transform(x)
176- strings = EntropyBottleneck().compress(y)
177- shape = tf.shape(y)[1 :]
178- ```
179-
180- ### Decompression
181-
182- ``` python
183- strings = tf.placeholder(tf.string, shape = [None ])
184- shape = tf.placeholder(tf.int32, shape = [3 ])
185- entropy_bottleneck = EntropyBottleneck(dtype = tf.float32)
186- y_ = entropy_bottleneck.decompress(strings, shape, channels = 5 )
187- x_ = backward_transform(y_)
188- ```
189- Here, we assumed that the tensor produced by the forward transform has 5
190- channels.
191-
192- The above four use cases can also be implemented within the same session (i.e.
193- on the same ` EntropyBottleneck ` instance), for testing purposes, etc., by
194- calling the object more than once.
195-
196-
197- ## Range encoder and decoder
198-
199- This package contains a range encoder and a range decoder, which can encode
200- integer data into strings using cumulative distribution functions (CDF). It is
201- used by the higher-level entropy bottleneck class described in the previous
202- section.
203-
204- ### Data and CDF values
205-
206- The data to be encoded should be non-negative integers in half-open interval
207- ` [0, m) ` . Then a CDF is represented as an integral vector of length ` m + 1 `
208- where ` CDF(i) = f(Pr(X < i) * 2^precision) ` for i = 0,1,...,m, and ` precision `
209- is an attribute in range ` 0 < precision <= 16 ` . The function ` f ` maps real
210- values into integers, e.g., round or floor. It is important that to encode a
211- number ` i ` , ` CDF(i + 1) - CDF(i) ` cannot be zero.
212-
213- Note that we used ` Pr(X < i) ` not ` Pr(X <= i) ` , and therefore CDF(0) = 0 always.
214-
215- ### RangeEncode: data shapes and CDF shapes
216-
217- For each data element, its CDF has to be provided. Therefore if the shape of CDF
218- should be ` data.shape + (m + 1,) ` in NumPy-like notation. For example, if ` data `
219- is a 2-D tensor of shape (10, 10) and its elements are in ` [0, 64) ` , then the
220- CDF tensor should have shape (10, 10, 65).
221-
222- This may make CDF tensor too large, and in many applications all data elements
223- may have the same probability distribution. To handle this, ` RangeEncode `
224- supports limited broadcasting CDF into data. Broadcasting is limited in the
225- following sense:
226-
227- - All CDF axes but the last one is broadcasted into data but not the other way
228- around,
229- - The number of CDF axes does not extend, i.e., ` CDF.ndim == data.ndim + 1 ` .
230-
231- In the previous example where data has shape (10, 10), the following are
232- acceptable CDF shapes:
233-
234- - (10, 10, 65)
235- - (1, 10, 65)
236- - (10, 1, 65)
237- - (1, 1, 65)
238-
239- ### RangeDecode
240-
241- ` RangeEncode ` encodes neither data shape nor termination character. Therefore
242- the decoder should know how many characters are encoded into the string, and
243- ` RangeDecode ` takes the encoded data shape as the second argument. The same
244- shape restrictions as ` RangeEncode ` inputs apply here.
245-
246- ### Example
247-
248- ``` python
249- data = tf.random_uniform((128 , 128 ), 0 , 10 , dtype = tf.int32)
250-
251- histogram = tf.bincount(data, minlength = 10 , maxlength = 10 )
252- cdf = tf.cumsum(histogram, exclusive = False )
253- # CDF should have length m + 1.
254- cdf = tf.pad(cdf, [[1 , 0 ]])
255- # CDF axis count must be one more than data.
256- cdf = tf.reshape(cdf, [1 , 1 , - 1 ])
257-
258- # Note that data has 2^14 elements, and therefore the sum of CDF is 2^14.
259- data = tf.cast(data, tf.int16)
260- encoded = coder.range_encode(data, cdf, precision = 14 )
261- decoded = coder.range_decode(encoded, tf.shape(data), cdf, precision = 14 )
262-
263- # data and decoded should be the same.
264- sess = tf.Session()
265- x, y = sess.run((data, decoded))
266- assert np.all(x == y)
267- ```
51+ There's also an introduction to our ` EntropyBottleneck ` class
52+ [ here] ( docs/entropy_bottleneck.md ) , and a description of the range coding ops
53+ [ here] ( docs/range_coding.md ) .
26854
26955## Authors
27056Johannes Ballé (github: [ jonycgn] ( https://github.com/jonycgn ) ),
0 commit comments