Skip to content

Commit b50cdb1

Browse files
committed
Add Ascii85 and base85 support to binascii
Add Ascii85 and base85 encoders and decoders to `binascii` and four new functions, `binascii.a2b_ascii85()`, `a2b_base85()`, `b2a_ascii85()`, and `b2a_base85()`. These replace the existing implementations in `base64.a85encode()`, `b85encode()`, `a85decode()`, and `b85encode()`. Performance is greatly improved, and memory usage is now constant instead of linear. No API or documentation changes are necessary with respect to `base64.a85encode()`, `b85encode()`, etc., and all existing unit tests for those functions continue to pass without modification. Note that attempting to decode Ascii85 or base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Resolves: gh-101178
1 parent 80b19a3 commit b50cdb1

File tree

6 files changed

+1307
-157
lines changed

6 files changed

+1307
-157
lines changed

Doc/library/binascii.rst

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,73 @@ The :mod:`binascii` module defines the following functions:
7777
Added the *newline* parameter.
7878

7979

80+
.. function:: a2b_ascii85(string, /, *, fold_spaces=False, wrap=False, ignore=b"")
81+
82+
Convert Ascii85 data back to binary and return the binary data.
83+
84+
Valid Ascii85 data contains characters from the Ascii85 alphabet in groups
85+
of five (except for the final group, which may have from two to five
86+
characters). Each group encodes 32 bits of binary data in the range from
87+
``0`` to ``2 ** 32 - 1``, inclusive. The special character ``z`` is
88+
accepted as a short form of the group ``!!!!!``, which encodes four
89+
consecutive null bytes.
90+
91+
If *fold_spaces* is true, the special character ``y`` is also accepted as a
92+
short form of the group ``+<VdL``, which encodes four consecutive spaces.
93+
Note that neither short form is permitted if it occurs in the middle of
94+
another group.
95+
96+
If *wrap* is true, the input begins with ``<~`` and ends with ``~>``, as in
97+
the Adobe Ascii85 format.
98+
99+
*ignore* is an optional bytes-like object that specifies characters to
100+
ignore in the input.
101+
102+
Invalid Ascii85 data will raise :exc:`binascii.Error`.
103+
104+
105+
.. function:: b2a_ascii85(data, /, *, fold_spaces=False, wrap=False, width=0, pad=False)
106+
107+
Convert binary data to a formatted sequence of ASCII characters in Ascii85
108+
coding. The return value is the converted data.
109+
110+
If *fold_spaces* is true, four consecutive spaces are encoded as the
111+
special character ``y`` instead of the sequence ``+<VdL``.
112+
113+
If *wrap* is true, the output begins with ``<~`` and ends with ``~>``, as
114+
in the Adobe Ascii85 format.
115+
116+
If *width* is provided and greater than 0, the output is split into lines
117+
of no more than the specified width separated by the ASCII newline
118+
character.
119+
120+
If *pad* is true, the input is padded to a multiple of 4 before encoding.
121+
122+
123+
.. function:: a2b_base85(string, /, *, strict_mode=False)
124+
125+
Convert base85 data back to binary and return the binary data.
126+
More than one line may be passed at a time.
127+
128+
If *strict_mode* is true, only valid base85 data will be converted.
129+
Invalid base85 data will raise :exc:`binascii.Error`.
130+
131+
Valid base85 data contains characters from the base85 alphabet in groups
132+
of five (except for the final group, which may have from two to five
133+
characters). Each group encodes 32 bits of binary data in the range from
134+
``0`` to ``2 ** 32 - 1``, inclusive.
135+
136+
137+
.. function:: b2a_base85(data, /, *, pad=False, newline=True)
138+
139+
Convert binary data to a line of ASCII characters in base85 coding.
140+
The return value is the converted line.
141+
142+
If *pad* is true, the input is padded to a multiple of 4 before encoding.
143+
144+
If *newline* is true, a newline char is appended to the result.
145+
146+
80147
.. function:: a2b_qp(data, header=False)
81148

82149
Convert a block of quoted-printable data back to binary and return the binary

Lib/base64.py

Lines changed: 6 additions & 154 deletions
Original file line numberDiff line numberDiff line change
@@ -295,36 +295,6 @@ def b16decode(s, casefold=False):
295295
#
296296
# Ascii85 encoding/decoding
297297
#
298-
299-
_a85chars = None
300-
_a85chars2 = None
301-
_A85START = b"<~"
302-
_A85END = b"~>"
303-
304-
def _85encode(b, chars, chars2, pad=False, foldnuls=False, foldspaces=False):
305-
# Helper function for a85encode and b85encode
306-
if not isinstance(b, bytes_types):
307-
b = memoryview(b).tobytes()
308-
309-
padding = (-len(b)) % 4
310-
if padding:
311-
b = b + b'\0' * padding
312-
words = struct.Struct('!%dI' % (len(b) // 4)).unpack(b)
313-
314-
chunks = [b'z' if foldnuls and not word else
315-
b'y' if foldspaces and word == 0x20202020 else
316-
(chars2[word // 614125] +
317-
chars2[word // 85 % 7225] +
318-
chars[word % 85])
319-
for word in words]
320-
321-
if padding and not pad:
322-
if chunks[-1] == b'z':
323-
chunks[-1] = chars[0] * 5
324-
chunks[-1] = chunks[-1][:-padding]
325-
326-
return b''.join(chunks)
327-
328298
def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False):
329299
"""Encode bytes-like object b using Ascii85 and return a bytes object.
330300
@@ -342,29 +312,8 @@ def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False):
342312
adobe controls whether the encoded byte sequence is framed with <~ and ~>,
343313
which is used by the Adobe implementation.
344314
"""
345-
global _a85chars, _a85chars2
346-
# Delay the initialization of tables to not waste memory
347-
# if the function is never called
348-
if _a85chars2 is None:
349-
_a85chars = [bytes((i,)) for i in range(33, 118)]
350-
_a85chars2 = [(a + b) for a in _a85chars for b in _a85chars]
351-
352-
result = _85encode(b, _a85chars, _a85chars2, pad, True, foldspaces)
353-
354-
if adobe:
355-
result = _A85START + result
356-
if wrapcol:
357-
wrapcol = max(2 if adobe else 1, wrapcol)
358-
chunks = [result[i: i + wrapcol]
359-
for i in range(0, len(result), wrapcol)]
360-
if adobe:
361-
if len(chunks[-1]) + 2 > wrapcol:
362-
chunks.append(b'')
363-
result = b'\n'.join(chunks)
364-
if adobe:
365-
result += _A85END
366-
367-
return result
315+
return binascii.b2a_ascii85(b, fold_spaces=foldspaces,
316+
wrap=adobe, width=wrapcol, pad=pad)
368317

369318
def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):
370319
"""Decode the Ascii85 encoded bytes-like object or ASCII string b.
@@ -383,121 +332,24 @@ def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'):
383332
The result is returned as a bytes object.
384333
"""
385334
b = _bytes_from_decode_data(b)
386-
if adobe:
387-
if not b.endswith(_A85END):
388-
raise ValueError(
389-
"Ascii85 encoded byte sequences must end "
390-
"with {!r}".format(_A85END)
391-
)
392-
if b.startswith(_A85START):
393-
b = b[2:-2] # Strip off start/end markers
394-
else:
395-
b = b[:-2]
396-
#
397-
# We have to go through this stepwise, so as to ignore spaces and handle
398-
# special short sequences
399-
#
400-
packI = struct.Struct('!I').pack
401-
decoded = []
402-
decoded_append = decoded.append
403-
curr = []
404-
curr_append = curr.append
405-
curr_clear = curr.clear
406-
for x in b + b'u' * 4:
407-
if b'!'[0] <= x <= b'u'[0]:
408-
curr_append(x)
409-
if len(curr) == 5:
410-
acc = 0
411-
for x in curr:
412-
acc = 85 * acc + (x - 33)
413-
try:
414-
decoded_append(packI(acc))
415-
except struct.error:
416-
raise ValueError('Ascii85 overflow') from None
417-
curr_clear()
418-
elif x == b'z'[0]:
419-
if curr:
420-
raise ValueError('z inside Ascii85 5-tuple')
421-
decoded_append(b'\0\0\0\0')
422-
elif foldspaces and x == b'y'[0]:
423-
if curr:
424-
raise ValueError('y inside Ascii85 5-tuple')
425-
decoded_append(b'\x20\x20\x20\x20')
426-
elif x in ignorechars:
427-
# Skip whitespace
428-
continue
429-
else:
430-
raise ValueError('Non-Ascii85 digit found: %c' % x)
431-
432-
result = b''.join(decoded)
433-
padding = 4 - len(curr)
434-
if padding:
435-
# Throw away the extra padding
436-
result = result[:-padding]
437-
return result
438-
439-
# The following code is originally taken (with permission) from Mercurial
440-
441-
_b85alphabet = (b"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
442-
b"abcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~")
443-
_b85chars = None
444-
_b85chars2 = None
445-
_b85dec = None
335+
return binascii.a2b_ascii85(b, fold_spaces=foldspaces,
336+
wrap=adobe, ignore=ignorechars)
446337

447338
def b85encode(b, pad=False):
448339
"""Encode bytes-like object b in base85 format and return a bytes object.
449340
450341
If pad is true, the input is padded with b'\\0' so its length is a multiple of
451342
4 bytes before encoding.
452343
"""
453-
global _b85chars, _b85chars2
454-
# Delay the initialization of tables to not waste memory
455-
# if the function is never called
456-
if _b85chars2 is None:
457-
_b85chars = [bytes((i,)) for i in _b85alphabet]
458-
_b85chars2 = [(a + b) for a in _b85chars for b in _b85chars]
459-
return _85encode(b, _b85chars, _b85chars2, pad)
344+
return binascii.b2a_base85(b, pad=pad, newline=False)
460345

461346
def b85decode(b):
462347
"""Decode the base85-encoded bytes-like object or ASCII string b
463348
464349
The result is returned as a bytes object.
465350
"""
466-
global _b85dec
467-
# Delay the initialization of tables to not waste memory
468-
# if the function is never called
469-
if _b85dec is None:
470-
_b85dec = [None] * 256
471-
for i, c in enumerate(_b85alphabet):
472-
_b85dec[c] = i
473-
474351
b = _bytes_from_decode_data(b)
475-
padding = (-len(b)) % 5
476-
b = b + b'~' * padding
477-
out = []
478-
packI = struct.Struct('!I').pack
479-
for i in range(0, len(b), 5):
480-
chunk = b[i:i + 5]
481-
acc = 0
482-
try:
483-
for c in chunk:
484-
acc = acc * 85 + _b85dec[c]
485-
except TypeError:
486-
for j, c in enumerate(chunk):
487-
if _b85dec[c] is None:
488-
raise ValueError('bad base85 character at position %d'
489-
% (i + j)) from None
490-
raise
491-
try:
492-
out.append(packI(acc))
493-
except struct.error:
494-
raise ValueError('base85 overflow in hunk starting at byte %d'
495-
% i) from None
496-
497-
result = b''.join(out)
498-
if padding:
499-
result = result[:-padding]
500-
return result
352+
return binascii.a2b_base85(b, strict_mode=True)
501353

502354
# Legacy interface. This code could be cleaned up since I don't believe
503355
# binascii has any line length limitations. It just doesn't seem worth it

0 commit comments

Comments
 (0)