1616This module provides functions for encoding binary data to printable
1717ASCII characters and decoding such encodings back to binary data.
1818This includes the :ref: `encodings specified in <base64-rfc-4648 >`
19- :rfc: `4648 ` (Base64, Base32 and Base16)
20- and the non-standard :ref: `Base85 encodings <base64-base-85 >`.
19+ :rfc: `4648 ` (Base64, Base32 and Base16), the :ref: `Base85 encoding
20+ <base64-base-85>` specified in `PDF 2.0
21+ <https://pdfa.org/resource/iso-32000-2/> `_, and non-standard variants
22+ of Base85 used elsewhere.
2123
2224There are two interfaces provided by this module. The modern interface
2325supports encoding :term: `bytes-like objects <bytes-like object> ` to ASCII
@@ -189,19 +191,28 @@ POST request.
189191Base85 Encodings
190192-----------------
191193
192- Base85 encoding is not formally specified but rather a de facto standard,
193- thus different systems perform the encoding differently.
194+ Base85 encoding is a family of algorithms which represent four bytes
195+ using five ASCII characters. Originally implemented in the Unix
196+ ``btoa(1) `` utility, a version of it was later adopted by Adobe in the
197+ PostScript language and is standardized in PDF 2.0 (ISO 32000-2).
198+ This version, in both its ``btoa `` and PDF variants, is implemented by
199+ :func: `a85encode `.
194200
195- The :func: ` a85encode ` and :func: ` b85encode ` functions in this module are two implementations of
196- the de facto standard. You should call the function with the Base85
197- implementation used by the software you intend to work with .
201+ A separate version, using a different output character set, was
202+ defined as an April Fool's joke in :rfc: ` 1924 ` but is now used by Git
203+ and other software. This version is implemented by :func: ` b85encode ` .
198204
199- The two functions present in this module differ in how they handle the following:
205+ Finally, a third version, using yet another output character set
206+ designed for safe inclusion in programming language strings, is
207+ defined by ZeroMQ and implemented here by :func: `z85encode `.
200208
201- * Whether to include enclosing ``<~ `` and ``~> `` markers
202- * Whether to include newline characters
203- * The set of ASCII characters used for encoding
204- * Handling of null bytes
209+ The functions present in this module differ in how they handle the following:
210+
211+ * Whether to include and expect enclosing ``<~ `` and ``~> `` markers.
212+ * Whether to fold the input into multiple lines.
213+ * The set of ASCII characters used for encoding.
214+ * Compact encodings of sequences of spaces and null bytes.
215+ * The encoding of zero-padding bytes applied to the input.
205216
206217Refer to the documentation of the individual functions for more information.
207218
@@ -212,17 +223,22 @@ Refer to the documentation of the individual functions for more information.
212223
213224 *foldspaces * is an optional flag that uses the special short sequence 'y'
214225 instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
215- feature is not supported by the " standard" Ascii85 encoding.
226+ feature is not supported by the standard encoding used in PDF .
216227
217228 *wrapcol * controls whether the output should have newline (``b'\n' ``)
218229 characters added to it. If this is non-zero, each output line will be
219230 at most this many characters long, excluding the trailing newline.
220231
221- *pad * controls whether the input is padded to a multiple of 4
222- before encoding. Note that the ``btoa `` implementation always pads.
232+ *pad * controls whether zero-padding applied to the end of the input
233+ is fully retained in the output encoding, as done by ``btoa ``,
234+ producing an exact multiple of 5 bytes of output. This is not part
235+ of the standard encoding used in PDF, as it does not preserve the
236+ length of the data.
223237
224- *adobe * controls whether the encoded byte sequence is framed with ``<~ ``
225- and ``~> ``, which is used by the Adobe implementation.
238+ *adobe * controls whether the encoded byte sequence is framed with
239+ ``<~ `` and ``~> ``, as in a PostScript base-85 string literal. Note
240+ that while ASCII85Decode streams in PDF documents *must * be
241+ terminated with ``~> ``, they *must not * use a leading ``<~ ``.
226242
227243 .. versionadded :: 3.4
228244
@@ -234,10 +250,12 @@ Refer to the documentation of the individual functions for more information.
234250
235251 *foldspaces * is a flag that specifies whether the 'y' short sequence
236252 should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
237- This feature is not supported by the "standard" Ascii85 encoding.
253+ This feature is not supported by the standard Ascii85 encoding used in
254+ PDF and PostScript.
238255
239- *adobe * controls whether the input sequence is in Adobe Ascii85 format
240- (i.e. is framed with <~ and ~>).
256+ *adobe * controls whether the ``<~ `` and ``~> `` markers are
257+ present. While the leading ``<~ `` is not required, the input must
258+ end with ``~> ``, or a :exc: `ValueError ` is raised.
241259
242260 *ignorechars * should be a byte string containing characters to ignore
243261 from the input. This should only contain whitespace characters, and by
@@ -251,35 +269,40 @@ Refer to the documentation of the individual functions for more information.
251269 Encode the :term: `bytes-like object ` *b * using base85 (as used in e.g.
252270 git-style binary diffs) and return the encoded :class: `bytes `.
253271
254- If *pad * is true, the input is padded with ``b'\0' `` so its length is a
255- multiple of 4 bytes before encoding.
272+ The input is padded with ``b'\0' `` so its length is a multiple of 4
273+ bytes before encoding. If *pad * is true, all the resulting
274+ characters are retained in the output, which will always be a
275+ multiple of 5 bytes, and thus the length of the data may not be
276+ preserved on decoding.
256277
257278 .. versionadded :: 3.4
258279
259280
260281.. function :: b85decode(b)
261282
262283 Decode the base85-encoded :term: `bytes-like object ` or ASCII string *b * and
263- return the decoded :class: `bytes `. Padding is implicitly removed, if
264- necessary.
284+ return the decoded :class: `bytes `.
265285
266286 .. versionadded :: 3.4
267287
268288
269289.. function :: z85encode(s)
270290
271291 Encode the :term: `bytes-like object ` *s * using Z85 (as used in ZeroMQ)
272- and return the encoded :class: `bytes `. See `Z85 specification
273- <https://rfc.zeromq.org/spec/32/> `_ for more information.
292+ and return the encoded :class: `bytes `.
293+
294+ The `ZeroMQ specification <https://rfc.zeromq.org/spec/32/ >`_
295+ requires the length of Z85-encoded data to be a multiple of 5
296+ bytes. To produce compliant data frames, you must pad the input
297+ data to this function to a multiple of 4 bytes.
274298
275299 .. versionadded :: 3.13
276300
277301
278302.. function :: z85decode(s)
279303
280304 Decode the Z85-encoded :term: `bytes-like object ` or ASCII string *s * and
281- return the decoded :class: `bytes `. See `Z85 specification
282- <https://rfc.zeromq.org/spec/32/> `_ for more information.
305+ return the decoded :class: `bytes `.
283306
284307 .. versionadded :: 3.13
285308
@@ -352,3 +375,11 @@ recommended to review the security section for any code deployed to production.
352375 Section 5.2, "Base64 Content-Transfer-Encoding," provides the definition of the
353376 base64 encoding.
354377
378+ `ISO 32000-2 Portable document format - Part 2: PDF 2.0 <https://pdfa.org/resource/iso-32000-2/ >`_
379+ Section 7.4.3, "ASCII85Decode Filter," provides the definition
380+ of the Ascii85 encoding used in PDF and PostScript, including
381+ the output character set and the details of data length preservation
382+ using zero-padding and partial output groups.
383+
384+ `ZeroMQ RFC 32/Z85 <https://rfc.zeromq.org/spec/32/ >`_
385+ The "Formal Specification" section provides the character set used in Z85.
0 commit comments