Skip to content

Conversation

@henryiii
Copy link
Contributor

@henryiii henryiii commented Dec 5, 2025

Discussion: https://discuss.python.org/t/105203

PEP 508 has some very strange rules related to packaging markers, and packaging has never been compliant with these rules (and uv is not either). In short, the spec states any version comparison operator (like >, >=, and ==) must treat the specifier as a version, even if the table lists it as a string, trying to parse it as a version, and if it can't parse, then using Python string comparison rules as fallback. This means implementation_name > "cpython" is valid and is supposed to evaluate False for cpython, but True for pypy. And the implementation is supposed to try to convert these to versions first, which in packaging is actually fairly expensive when done many times.

Here's the table (minus a column for clarity here):

Marker Type Sample values
os_name String posix, java
sys_platform String linux, linux2, darwin, java1.8.0_51 (note that "linux" is from Python 3 and "linux2" from Python 2)
platform_machine String x86_64
platform_python_implementation String CPython, Jython
platform_release String 3.14.1-x86_64-linode39, 14.5.0, 1.8.0_51
platform_system String Linux, Windows, Java
platform_version String #1 SMP Fri Apr 25 13:07:35 EDT 2014
Java HotSpot(TM) 64-Bit Server VM, 25.51-b03, Oracle Corporation
Darwin Kernel Version 14.5.0: Wed Jul 29 02:18:53 PDT 2015; root:xnu-2782.40.9~2/RELEASE_X86_64
python_version Version 3.4, 2.7
python_full_version Version 3.4.0, 3.5.0b1
implementation_name String cpython
implementation_version Version 3.4.0, 3.5.0b1
extra String toml
extras Set of strings {"toml"}
dependency_groups Set of strings {"test"}

It should be noted, of these string fields, only one of them actually might look like a version: platform_release (platform.release()). On some systems, like macOS, this is a valid version. The only other one that anyone on GitHub has ever tried to use with a comparison that's not equality is sys_platform >= "win32", which is obviously hoping that it would cover an imaginary "win64" (but "win128" would compare less!).

There is one valid use for platform_release; when you are on a system that you know has a valid PEP 440 style version, you can in theory gate it:

scipy; platform_system != "Darwin" or platform_machine != "arm64" or platform_version >= "12"

In packaging<22, LegacyVersion was used, so these comparisons always returned False, and did not fall back to Python string comparison, since that's how LegacyVersion worked. Starting in packaging 22, with the removal of LegacyVersion, these started throwing InvalidVersion errors instead (also not spec compliant). The spec does not specify if short circuit evaluation is required (since it basically has fallbacks for everything, there's not really a point), so this means the above expression, in packaging 22-25.0 fails on any system that doesn't have a valid PEP 440 version here, rendering it useless unless you only support the subset of systems where this does convert to a version. This was an issue adopting newer packaging versions in pip, and has basically resulted in every project that is actively maintained having to stop relying on this mechanism - less than 50 examples remain on GitHub of this. A significant number of pull requests and issues are open on packaging with various ways to fix these issues.

From some discussions with the uv team and looking at the code, uv follows the table, and does not try to convert everything to Version, though it does implement string comparisons; sys_platform == "win32" and platform_release > "9" would fail on Windows 11, for example, while it would pass on packaging - but packaging would currently crash with an error if this line was evaluated on most Linux systems.

Now we are faced with an issue: packaging 25.1 is about ready for release, and the way it handles invalid version comparisons has changed. As the code stands now, this will return False again, like it used to. Also, we've been really focused on performance; reading every Version on PyPI is 2x faster, and constructing/using SpecifierSet is ~3x faster (partially because we construct fewer Versions, which are costly due to needing a regex, even at 2x faster). This should improve the performance of the pip resolver, which constructs thousands of versions and specifier sets. One of the remaining areas where we are running Version on is on every dependency marker.

So there are three problems:

  • The behavior is changing from an error to a non-standards complaint behavior (even though it's the same behavior from years ago)
  • We still have to try to construct versions on every single item in the above table. We are having to try to run the version regex on every marker, "cpython", "x86_64", etc.
  • There are weird bugs open, like v0 can't be used as the name of an extra, because it parses as a version. We have to take an expression like thing; extra == "v0" and parse the v0 as a version according to the spec, since == is a version comparison, the type of the field is meaningless according to the spec.

So I'd like to propose the following spec changes to align the spec with the way these have been handled since the beginning in packaging, and reduce our work required as well. It's a minimal change; larger changes could be worked on later if someone wanted to work on a PEP for cleaning this up. But here's my proposal:

  • Change the spec to state only Version values must have the "convert to version if possible" behavior. This will allow implementations to fix errors like using "v0" as an extra, and provide a performance boost. uv is doing this anyway.
  • Make platform_release a Version (could be indicated as string | Version in the table to help users realize it will often fail the conversion, but it keeps the legacy Version behavior above.)
  • Define > and < as always False, and <=, >= as equivalent to ==, for strings and failed Version conversions. This is the legacy (<22) and current (in main) behavior of packaging. Python string ordering is never reliable; even if it happens to work going from 8 to 9, it will break on the next release because 9 is more than 10. And this requires that other languages, like Rust, follow Python's rules for string ordering.

This should only affect <50 (legacy) packages on GitHub, and it will do the right thing for them as well (making the packaging <22 behavior official). (Most of these have other typos, like 21 instead of 12 for the macOS version, so pretty sure they are dead projects, but it won't break them).

I'd like to do this now, since we are replacing an Error with our old behavior, which was not spec compliant, so packages may start appearing expecting this behavior (again).


📚 Documentation preview 📚: https://python-packaging-user-guide--1971.org.readthedocs.build/en/1971/

Discussion: https://discuss.python.org/t/105203/2

PEP 508 has some very strange rules related to packaging markers, and `packaging` has never been compliant with these rules (and `uv` is not either). In short, the spec states any version comparison operator (like `>`, `>=`, and `==`) must treat the specifier as a version, even if the table lists it as a string, trying to parse it as a version, and if it can't parse, then using Python string comparison rules as fallback. This means `implementation_name > "cpython"` is valid and is supposed to evaluate `False` for `cpython`, but `True` for `pypy`. And the implementation is supposed to try to convert these to versions first, which in `packaging` is actually fairly expensive when done many times.

Here's the table (minus a column for clarity here):

| Marker                           | Type               | Sample values |
|----------------------------------|--------------------|---------------|
| `os_name`                        | String             | `posix`, `java` |
| `sys_platform`                   | String             | `linux`, `linux2`, `darwin`, `java1.8.0_51` (note that "linux" is from Python 3 and "linux2" from Python 2) |
| `platform_machine`               | String             | `x86_64` |
| `platform_python_implementation` | String             | `CPython`, `Jython` |
| `platform_release`               | String             | `3.14.1-x86_64-linode39`, `14.5.0`, `1.8.0_51` |
| `platform_system`                | String             | `Linux`, `Windows`, `Java` |
| `platform_version`               | String             | `pypa#1 SMP Fri Apr 25 13:07:35 EDT 2014`<br>`Java HotSpot(TM) 64-Bit Server VM, 25.51-b03, Oracle Corporation`<br>`Darwin Kernel Version 14.5.0: Wed Jul 29 02:18:53 PDT 2015; root:xnu-2782.40.9~2/RELEASE_X86_64` |
| `python_version`                 | Version            | `3.4`, `2.7` |
| `python_full_version`            | Version            | `3.4.0`, `3.5.0b1` |
| `implementation_name`            | String             | `cpython` |
| `implementation_version`         | Version            | `3.4.0`, `3.5.0b1` |
| `extra`                          | String             | `toml` |
| `extras`                         | Set of strings     | `{"toml"}` |
| `dependency_groups`              | Set of strings     | `{"test"}` |

It should be noted, of these string fields, only one of them actually might look like a version: `platform_release` (`platform.release()`). On some systems, like macOS, this is a valid version. The only other one that anyone on GitHub has [ever tried to use](https://github.com/search?q=path%3A%2F%28%5E%7C%5C%2F%29%28pyproject%5C.toml%7Csetup.py%29%24%2F+%2F%28sys_platform%7Cplatform_release%7C+platform_version%29%5Cs*%5B%3C%3E%5D%2F&type=code&p=2) with a comparison that's not equality [is `sys_platform >= "win32"`](https://github.com/search?type=code&q=%22sys_platform+%3E%3D+%27win32%27%22), which is obviously hoping that it would cover an imaginary `"win64"` (but `"win128"` would compare less!).

There is one valid use for `platform_release`; when you are on a system that you know has a valid PEP 440 style version, you can in theory gate it:

```
scipy; platform_system != "Darwin" or platform_machine != "arm64" or platform_version >= "12"
```

In `packaging<22`, LegacyVersion was used, so these comparisons always returned False, and did not fall back to Python string comparison, since that's how LegacyVersion worked. Starting in `packaging` 22, with the removal of LegacyVersion, these started throwing InvalidVersion errors instead (also not spec compliant). The spec does not specify if short circuit evaluation is required (since it basically has fallbacks for everything, there's not really a point), so this means the above expression, in packaging 22-25.0 fails on any system that doesn't have a valid PEP 440 version here, rendering it useless unless you only support the subset of systems where this does convert to a version. This was an issue adopting newer packaging versions in pip, and has basically resulted in every project that is actively maintained having to stop relying on this mechanism - less than 50 examples remain on GitHub of this. A significant number of pull requests and issues are open on packaging with various ways to fix these issues.

From some discussions with the `uv` team and looking at the code, `uv` follows the table, and does not try to convert everything to `Version`, though it does implement string comparisons; `sys_platform == "win32" and platform_release > "9"` would fail on Windows 11, for example, while it would pass on `packaging` - but packaging would currently crash with an error if this line was evaluated on most Linux systems.

Now we are faced with an issue: `packaging 25.1` is about ready for release, and the way it handles invalid version comparisons has changed. As the code stands now, this will return False again, like it used to. Also, we've been really focused on performance; reading every Version on PyPI is 2x faster, and constructing/using SpecifierSet is ~3x faster (partially because we construct fewer Versions, which are costly due to needing a regex, even at 2x faster). This should improve the performance of the pip resolver, which constructs thousands of versions and specifier sets. One of the remaining areas where we are running Version on is on every dependency marker.

So there are three problems:

* The behavior is changing from an error to a non-standards complaint behavior (even though it's the same behavior from years ago)
* We still have to try to construct versions on _every single item_ in the above table. We are having to try to run the version regex on every marker, `"cpython"`, `"x86_64"`, etc.
* There are weird bugs open, like `v0` can't be used as the name of an extra, because it parses as a version. We have to take an expression like `thing; extra == "v0"` and parse the v0 as a version according to the spec, since `==` is a version comparison, the type of the field is meaningless according to the spec.

So I'd like to propose the following spec changes to align the spec with the way these have been handled since the beginning in packaging, and reduce our work required as well. It's a minimal change; larger changes could be worked on later if someone wanted to work on a PEP for cleaning this up. But here's my proposal:

* Change the spec to state only `Version` values must have the "convert to version if possible" behavior. This will allow implementations to fix errors like using `"v0"` as an extra, and provide a performance boost. `uv` is doing this anyway.
* Make `platform_release` a `Version` (could be indicated as `string | Version` in the table to help users realize it will often fail the conversion, but it keeps the legacy `Version` behavior above.)
* Define `>` and `<` as always `False`, and `<=`, `>=` as equivalent to `==`, for strings and failed Version conversions. This is the legacy (<22) and current (in main) behavior of `packaging`. Python string ordering is never reliable; even if it happens to work going from `8` to `9`, it will break on the next release because `9` is more than `10`. And this requires that other languages, like Rust, follow Python's rules for string ordering.

This should only affect <50 (legacy) packages on GitHub, and it will do the right thing for them as well (making the packaging <22 behavior official). (Most of these have other typos, like 21 instead of 12 for the macOS version, so pretty sure they are dead projects, but it won't break them).

I'd like to do this now, since we are replacing an Error with our old behavior, which was not spec compliant, so packages may start appearing expecting this behavior (again).

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
@henryiii
Copy link
Contributor Author

henryiii commented Dec 5, 2025

Draft for now until the discussion finishes. Also, this is almost the current status of packaging, as it implements points 2 and 3 already. pypa/packaging#939 would implement point 1. If we wanted to skip point 3, then that's pypa/packaging#889. Other attempts to fix this were pypa/packaging#883, pypa/packaging#877, pypa/packaging#834. And reverting the change of Error -> False is in pypa/packaging#1007.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant