Skip to content

Commit 45d4a34

Browse files
gh-101178: Add Ascii85, Base85, and Z85 support to binascii (GH-102753)
Add Ascii85, Base85, and Z85 encoders and decoders to binascii, replacing the existing pure Python implementations in base64. This makes the codecs two orders of magnitude faster and consume two orders of magnitude less memory. Note that attempting to decode Ascii85 or Base85 data of length 1 mod 5 (after accounting for Ascii85 quirks) now produces an error, as no encoder would emit such data. This should be the only significant externally visible difference compared to the old implementation. Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
1 parent d891b2b commit 45d4a34

File tree

14 files changed

+1558
-192
lines changed

14 files changed

+1558
-192
lines changed

Doc/library/base64.rst

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -247,8 +247,9 @@ Refer to the documentation of the individual functions for more information.
247247
after at most every *wrapcol* characters.
248248
If *wrapcol* is zero (default), do not insert any newlines.
249249

250-
*pad* controls whether the input is padded to a multiple of 4
251-
before encoding. Note that the ``btoa`` implementation always pads.
250+
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
251+
multiple of 4 bytes before encoding.
252+
Note that the ``btoa`` implementation always pads.
252253

253254
*adobe* controls whether the encoded byte sequence is framed with ``<~``
254255
and ``~>``, which is used by the Adobe implementation.
@@ -268,8 +269,9 @@ Refer to the documentation of the individual functions for more information.
268269
*adobe* controls whether the input sequence is in Adobe Ascii85 format
269270
(i.e. is framed with <~ and ~>).
270271

271-
*ignorechars* should be a byte string containing characters to ignore
272-
from the input. This should only contain whitespace characters, and by
272+
*ignorechars* should be a :term:`bytes-like object` containing characters
273+
to ignore from the input.
274+
This should only contain whitespace characters, and by
273275
default contains all whitespace characters in ASCII.
274276

275277
.. versionadded:: 3.4

Doc/library/binascii.rst

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,112 @@ The :mod:`!binascii` module defines the following functions:
9898
Added the *wrapcol* parameter.
9999

100100

101+
.. function:: a2b_ascii85(string, /, *, foldspaces=False, adobe=False, ignorechars=b"")
102+
103+
Convert Ascii85 data back to binary and return the binary data.
104+
105+
Valid Ascii85 data contains characters from the Ascii85 alphabet in groups
106+
of five (except for the final group, which may have from two to five
107+
characters). Each group encodes 32 bits of binary data in the range from
108+
``0`` to ``2 ** 32 - 1``, inclusive. The special character ``z`` is
109+
accepted as a short form of the group ``!!!!!``, which encodes four
110+
consecutive null bytes.
111+
112+
*foldspaces* is a flag that specifies whether the 'y' short sequence
113+
should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20).
114+
This feature is not supported by the "standard" Ascii85 encoding.
115+
116+
*adobe* controls whether the input sequence is in Adobe Ascii85 format
117+
(i.e. is framed with <~ and ~>).
118+
119+
*ignorechars* should be a :term:`bytes-like object` containing characters
120+
to ignore from the input.
121+
This should only contain whitespace characters.
122+
123+
Invalid Ascii85 data will raise :exc:`binascii.Error`.
124+
125+
.. versionadded:: next
126+
127+
128+
.. function:: b2a_ascii85(data, /, *, foldspaces=False, wrapcol=0, pad=False, adobe=False)
129+
130+
Convert binary data to a formatted sequence of ASCII characters in Ascii85
131+
coding. The return value is the converted data.
132+
133+
*foldspaces* is an optional flag that uses the special short sequence 'y'
134+
instead of 4 consecutive spaces (ASCII 0x20) as supported by 'btoa'. This
135+
feature is not supported by the "standard" Ascii85 encoding.
136+
137+
If *wrapcol* is non-zero, insert a newline (``b'\n'``) character
138+
after at most every *wrapcol* characters.
139+
If *wrapcol* is zero (default), do not insert any newlines.
140+
141+
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
142+
multiple of 4 bytes before encoding.
143+
Note that the ``btoa`` implementation always pads.
144+
145+
*adobe* controls whether the encoded byte sequence is framed with ``<~``
146+
and ``~>``, which is used by the Adobe implementation.
147+
148+
.. versionadded:: next
149+
150+
151+
.. function:: a2b_base85(string, /)
152+
153+
Convert Base85 data back to binary and return the binary data.
154+
More than one line may be passed at a time.
155+
156+
Valid Base85 data contains characters from the Base85 alphabet in groups
157+
of five (except for the final group, which may have from two to five
158+
characters). Each group encodes 32 bits of binary data in the range from
159+
``0`` to ``2 ** 32 - 1``, inclusive.
160+
161+
Invalid Base85 data will raise :exc:`binascii.Error`.
162+
163+
.. versionadded:: next
164+
165+
166+
.. function:: b2a_base85(data, /, *, pad=False)
167+
168+
Convert binary data to a line of ASCII characters in Base85 coding.
169+
The return value is the converted line.
170+
171+
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
172+
multiple of 4 bytes before encoding.
173+
174+
.. versionadded:: next
175+
176+
177+
.. function:: a2b_z85(string, /)
178+
179+
Convert Z85 data back to binary and return the binary data.
180+
More than one line may be passed at a time.
181+
182+
Valid Z85 data contains characters from the Z85 alphabet in groups
183+
of five (except for the final group, which may have from two to five
184+
characters). Each group encodes 32 bits of binary data in the range from
185+
``0`` to ``2 ** 32 - 1``, inclusive.
186+
187+
See `Z85 specification <https://rfc.zeromq.org/spec/32/>`_ for more information.
188+
189+
Invalid Z85 data will raise :exc:`binascii.Error`.
190+
191+
.. versionadded:: next
192+
193+
194+
.. function:: b2a_z85(data, /, *, pad=False)
195+
196+
Convert binary data to a line of ASCII characters in Z85 coding.
197+
The return value is the converted line.
198+
199+
If *pad* is true, the input is padded with ``b'\0'`` so its length is a
200+
multiple of 4 bytes before encoding.
201+
202+
See `Z85 specification <https://rfc.zeromq.org/spec/32/>`_ for more information.
203+
204+
.. versionadded:: next
205+
206+
101207
.. function:: a2b_qp(data, header=False)
102208

103209
Convert a block of quoted-printable data back to binary and return the binary

Doc/whatsnew/3.15.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -491,6 +491,14 @@ base64
491491
binascii
492492
--------
493493

494+
* Added functions for Ascii85, Base85, and Z85 encoding:
495+
496+
- :func:`~binascii.b2a_ascii85` and :func:`~binascii.a2b_ascii85`
497+
- :func:`~binascii.b2a_base85` and :func:`~binascii.a2b_base85`
498+
- :func:`~binascii.b2a_z85` and :func:`~binascii.a2b_z85`
499+
500+
(Contributed by James Seo and Serhiy Storchaka in :gh:`101178`.)
501+
494502
* Added the *wrapcol* parameter in :func:`~binascii.b2a_base64`.
495503
(Contributed by Serhiy Storchaka in :gh:`143214`.)
496504

@@ -1059,6 +1067,11 @@ base64 & binascii
10591067
faster thanks to simple CPU pipelining optimizations.
10601068
(Contributed by Gregory P. Smith and Serhiy Storchaka in :gh:`143262`.)
10611069

1070+
* Implementation for Ascii85, Base85, and Z85 encoding has been rewritten in C.
1071+
Encoding and decoding is now two orders of magnitude faster and consumes
1072+
two orders of magnitude less memory.
1073+
(Contributed by James Seo and Serhiy Storchaka in :gh:`101178`.)
1074+
10621075
csv
10631076
---
10641077

Include/internal/pycore_global_objects_fini_generated.h

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_global_strings.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,7 @@ struct _Py_global_strings {
297297
STRUCT_FOR_ID(aclose)
298298
STRUCT_FOR_ID(add)
299299
STRUCT_FOR_ID(add_done_callback)
300+
STRUCT_FOR_ID(adobe)
300301
STRUCT_FOR_ID(after_in_child)
301302
STRUCT_FOR_ID(after_in_parent)
302303
STRUCT_FOR_ID(alias)
@@ -492,6 +493,7 @@ struct _Py_global_strings {
492493
STRUCT_FOR_ID(flags)
493494
STRUCT_FOR_ID(flush)
494495
STRUCT_FOR_ID(fold)
496+
STRUCT_FOR_ID(foldspaces)
495497
STRUCT_FOR_ID(follow_symlinks)
496498
STRUCT_FOR_ID(format)
497499
STRUCT_FOR_ID(format_spec)
@@ -691,6 +693,7 @@ struct _Py_global_strings {
691693
STRUCT_FOR_ID(outpath)
692694
STRUCT_FOR_ID(overlapped)
693695
STRUCT_FOR_ID(owner)
696+
STRUCT_FOR_ID(pad)
694697
STRUCT_FOR_ID(pages)
695698
STRUCT_FOR_ID(parameter)
696699
STRUCT_FOR_ID(parent)

Include/internal/pycore_runtime_init_generated.h

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_unicodeobject_generated.h

Lines changed: 12 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)