Skip to content

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Dec 27, 2025

@serhiy-storchaka
Copy link
Member Author

As a side effect -- the calculation of the size for the output buffer is now more accurate. Encoding more than 1GiB is now potentially supported on 32-bit systems (up to 1.5GiB if there is enough memory).

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the warning in memoryobject.c, this was my fault but it's fixed on main and backports are pending.

Copy link
Contributor

@cmaloney cmaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 overall, didn't review the implementation in depth but went through the docs + overall API shape.

goto toolong;
}
if (wrapcol && out_len) {
out_len += (out_len - 1) / wrapcol;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support unreasonable wrapcol values? Values that are not multiples of 4 really do not make sense and feel unlikely to be used for any practical use of base64. As do low values such as 1-3. (small values are inefficient with our post-processing implementation anyways)

Do we know of any practical actual need for odd-multiple wrapping?

If we restrict the allowed wrappings to start with, we could open them up wider later if anyone has a need. I doubt there is demand for such strange for base64's use case wrapping sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd-wrapping: 4n - 1 to really have a space on the right if there is a hard wrap at 4n. Only for rendering purposes I'd say and likely not useful. But 4n - 2 may also be an option. Or odd-wrapping could be useful.

More generally, email's policy allows to specify its own max line length and we currently have: "maxlen = policy.max_line_length or sys.maxsize". So it's important to support all inputs. However, we can specialize the case sys.maxsize to be equivalent to "no wrapping".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this. We can round wrapcol down to the multiplier of 4 and set some minimal non-zero limit. plistlib does this, with the minimum 16. a85encode() sets the limit to 2 if adobe=True. quopri also needs some minimum (not less than 3, plus an escaped newline).

For Base 4 the reason can be that some third-party decoders can not support groups split between lines. See also #76672. But I am not sure whether we should force this or left on the user responsibility? What should we do for values from 1 to 3? Raise an exception? Round up to 4?

Copy link
Member

@picnixz picnixz Dec 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 1, 2, 3, if it's not efficient, then so be it. It's the user responsibility at this point. For 1, it's the same as having no wrap and then put \n in between each values. So they could just do their own b"\n.".join(list(encode(s))) in this case which may be faster than encode(s, wrapcol=1). For 2-3, if it works but is not efficient, just leave it as is I guess (though the case n=2 may be interesting and we could try to optimize it, but I don't think people really want to group b64 encoded content by 2; the number of lines is likely to be too large to be visually interesting).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noone will intentionally pass such pointless tiny values anyways - it is our choice if we bother to disallow them or not. if we do not I'd suggest a ValueError rather than rounding up so that anyone with an actual need (prediction: nobody) can come file a bug explaining why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a85encode(data, adobe=True, wrapcol=1) wraps the output in lines of length 2, so <~ and ~> will not split.

if (state == NULL) {
return NULL;
}
PyErr_SetString(state->Error, "Too much data for base64 line");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this error is raised even when no newlines are involved. i'd just leave it as "too much data" without trying to explain why. it's effectively a MemoryError in a sense as the only reason this would happen is if output would cross Py_ssize_t bounds which is unrealistic to ever happen on any system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is realistic on 32-bit systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just get rid of the word "line" then as this isn't line specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants