-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix(markers): Fix/change dependency specifiers #1971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Discussion: https://discuss.python.org/t/105203/2 PEP 508 has some very strange rules related to packaging markers, and `packaging` has never been compliant with these rules (and `uv` is not either). In short, the spec states any version comparison operator (like `>`, `>=`, and `==`) must treat the specifier as a version, even if the table lists it as a string, trying to parse it as a version, and if it can't parse, then using Python string comparison rules as fallback. This means `implementation_name > "cpython"` is valid and is supposed to evaluate `False` for `cpython`, but `True` for `pypy`. And the implementation is supposed to try to convert these to versions first, which in `packaging` is actually fairly expensive when done many times. Here's the table (minus a column for clarity here): | Marker | Type | Sample values | |----------------------------------|--------------------|---------------| | `os_name` | String | `posix`, `java` | | `sys_platform` | String | `linux`, `linux2`, `darwin`, `java1.8.0_51` (note that "linux" is from Python 3 and "linux2" from Python 2) | | `platform_machine` | String | `x86_64` | | `platform_python_implementation` | String | `CPython`, `Jython` | | `platform_release` | String | `3.14.1-x86_64-linode39`, `14.5.0`, `1.8.0_51` | | `platform_system` | String | `Linux`, `Windows`, `Java` | | `platform_version` | String | `pypa#1 SMP Fri Apr 25 13:07:35 EDT 2014`<br>`Java HotSpot(TM) 64-Bit Server VM, 25.51-b03, Oracle Corporation`<br>`Darwin Kernel Version 14.5.0: Wed Jul 29 02:18:53 PDT 2015; root:xnu-2782.40.9~2/RELEASE_X86_64` | | `python_version` | Version | `3.4`, `2.7` | | `python_full_version` | Version | `3.4.0`, `3.5.0b1` | | `implementation_name` | String | `cpython` | | `implementation_version` | Version | `3.4.0`, `3.5.0b1` | | `extra` | String | `toml` | | `extras` | Set of strings | `{"toml"}` | | `dependency_groups` | Set of strings | `{"test"}` | It should be noted, of these string fields, only one of them actually might look like a version: `platform_release` (`platform.release()`). On some systems, like macOS, this is a valid version. The only other one that anyone on GitHub has [ever tried to use](https://github.com/search?q=path%3A%2F%28%5E%7C%5C%2F%29%28pyproject%5C.toml%7Csetup.py%29%24%2F+%2F%28sys_platform%7Cplatform_release%7C+platform_version%29%5Cs*%5B%3C%3E%5D%2F&type=code&p=2) with a comparison that's not equality [is `sys_platform >= "win32"`](https://github.com/search?type=code&q=%22sys_platform+%3E%3D+%27win32%27%22), which is obviously hoping that it would cover an imaginary `"win64"` (but `"win128"` would compare less!). There is one valid use for `platform_release`; when you are on a system that you know has a valid PEP 440 style version, you can in theory gate it: ``` scipy; platform_system != "Darwin" or platform_machine != "arm64" or platform_version >= "12" ``` In `packaging<22`, LegacyVersion was used, so these comparisons always returned False, and did not fall back to Python string comparison, since that's how LegacyVersion worked. Starting in `packaging` 22, with the removal of LegacyVersion, these started throwing InvalidVersion errors instead (also not spec compliant). The spec does not specify if short circuit evaluation is required (since it basically has fallbacks for everything, there's not really a point), so this means the above expression, in packaging 22-25.0 fails on any system that doesn't have a valid PEP 440 version here, rendering it useless unless you only support the subset of systems where this does convert to a version. This was an issue adopting newer packaging versions in pip, and has basically resulted in every project that is actively maintained having to stop relying on this mechanism - less than 50 examples remain on GitHub of this. A significant number of pull requests and issues are open on packaging with various ways to fix these issues. From some discussions with the `uv` team and looking at the code, `uv` follows the table, and does not try to convert everything to `Version`, though it does implement string comparisons; `sys_platform == "win32" and platform_release > "9"` would fail on Windows 11, for example, while it would pass on `packaging` - but packaging would currently crash with an error if this line was evaluated on most Linux systems. Now we are faced with an issue: `packaging 25.1` is about ready for release, and the way it handles invalid version comparisons has changed. As the code stands now, this will return False again, like it used to. Also, we've been really focused on performance; reading every Version on PyPI is 2x faster, and constructing/using SpecifierSet is ~3x faster (partially because we construct fewer Versions, which are costly due to needing a regex, even at 2x faster). This should improve the performance of the pip resolver, which constructs thousands of versions and specifier sets. One of the remaining areas where we are running Version on is on every dependency marker. So there are three problems: * The behavior is changing from an error to a non-standards complaint behavior (even though it's the same behavior from years ago) * We still have to try to construct versions on _every single item_ in the above table. We are having to try to run the version regex on every marker, `"cpython"`, `"x86_64"`, etc. * There are weird bugs open, like `v0` can't be used as the name of an extra, because it parses as a version. We have to take an expression like `thing; extra == "v0"` and parse the v0 as a version according to the spec, since `==` is a version comparison, the type of the field is meaningless according to the spec. So I'd like to propose the following spec changes to align the spec with the way these have been handled since the beginning in packaging, and reduce our work required as well. It's a minimal change; larger changes could be worked on later if someone wanted to work on a PEP for cleaning this up. But here's my proposal: * Change the spec to state only `Version` values must have the "convert to version if possible" behavior. This will allow implementations to fix errors like using `"v0"` as an extra, and provide a performance boost. `uv` is doing this anyway. * Make `platform_release` a `Version` (could be indicated as `string | Version` in the table to help users realize it will often fail the conversion, but it keeps the legacy `Version` behavior above.) * Define `>` and `<` as always `False`, and `<=`, `>=` as equivalent to `==`, for strings and failed Version conversions. This is the legacy (<22) and current (in main) behavior of `packaging`. Python string ordering is never reliable; even if it happens to work going from `8` to `9`, it will break on the next release because `9` is more than `10`. And this requires that other languages, like Rust, follow Python's rules for string ordering. This should only affect <50 (legacy) packages on GitHub, and it will do the right thing for them as well (making the packaging <22 behavior official). (Most of these have other typos, like 21 instead of 12 for the macOS version, so pretty sure they are dead projects, but it won't break them). I'd like to do this now, since we are replacing an Error with our old behavior, which was not spec compliant, so packages may start appearing expecting this behavior (again). Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
|
Draft for now until the discussion finishes. Also, this is almost the current status of |
Discussion: https://discuss.python.org/t/105203
PEP 508 has some very strange rules related to packaging markers, and
packaginghas never been compliant with these rules (anduvis not either). In short, the spec states any version comparison operator (like>,>=, and==) must treat the specifier as a version, even if the table lists it as a string, trying to parse it as a version, and if it can't parse, then using Python string comparison rules as fallback. This meansimplementation_name > "cpython"is valid and is supposed to evaluateFalseforcpython, butTrueforpypy. And the implementation is supposed to try to convert these to versions first, which inpackagingis actually fairly expensive when done many times.Here's the table (minus a column for clarity here):
os_nameposix,javasys_platformlinux,linux2,darwin,java1.8.0_51(note that "linux" is from Python 3 and "linux2" from Python 2)platform_machinex86_64platform_python_implementationCPython,Jythonplatform_release3.14.1-x86_64-linode39,14.5.0,1.8.0_51platform_systemLinux,Windows,Javaplatform_version#1 SMP Fri Apr 25 13:07:35 EDT 2014Java HotSpot(TM) 64-Bit Server VM, 25.51-b03, Oracle CorporationDarwin Kernel Version 14.5.0: Wed Jul 29 02:18:53 PDT 2015; root:xnu-2782.40.9~2/RELEASE_X86_64python_version3.4,2.7python_full_version3.4.0,3.5.0b1implementation_namecpythonimplementation_version3.4.0,3.5.0b1extratomlextras{"toml"}dependency_groups{"test"}It should be noted, of these string fields, only one of them actually might look like a version:
platform_release(platform.release()). On some systems, like macOS, this is a valid version. The only other one that anyone on GitHub has ever tried to use with a comparison that's not equality issys_platform >= "win32", which is obviously hoping that it would cover an imaginary"win64"(but"win128"would compare less!).There is one valid use for
platform_release; when you are on a system that you know has a valid PEP 440 style version, you can in theory gate it:In
packaging<22, LegacyVersion was used, so these comparisons always returned False, and did not fall back to Python string comparison, since that's how LegacyVersion worked. Starting inpackaging22, with the removal of LegacyVersion, these started throwing InvalidVersion errors instead (also not spec compliant). The spec does not specify if short circuit evaluation is required (since it basically has fallbacks for everything, there's not really a point), so this means the above expression, in packaging 22-25.0 fails on any system that doesn't have a valid PEP 440 version here, rendering it useless unless you only support the subset of systems where this does convert to a version. This was an issue adopting newer packaging versions in pip, and has basically resulted in every project that is actively maintained having to stop relying on this mechanism - less than 50 examples remain on GitHub of this. A significant number of pull requests and issues are open on packaging with various ways to fix these issues.From some discussions with the
uvteam and looking at the code,uvfollows the table, and does not try to convert everything toVersion, though it does implement string comparisons;sys_platform == "win32" and platform_release > "9"would fail on Windows 11, for example, while it would pass onpackaging- but packaging would currently crash with an error if this line was evaluated on most Linux systems.Now we are faced with an issue:
packaging 25.1is about ready for release, and the way it handles invalid version comparisons has changed. As the code stands now, this will return False again, like it used to. Also, we've been really focused on performance; reading every Version on PyPI is 2x faster, and constructing/using SpecifierSet is ~3x faster (partially because we construct fewer Versions, which are costly due to needing a regex, even at 2x faster). This should improve the performance of the pip resolver, which constructs thousands of versions and specifier sets. One of the remaining areas where we are running Version on is on every dependency marker.So there are three problems:
"cpython","x86_64", etc.v0can't be used as the name of an extra, because it parses as a version. We have to take an expression likething; extra == "v0"and parse the v0 as a version according to the spec, since==is a version comparison, the type of the field is meaningless according to the spec.So I'd like to propose the following spec changes to align the spec with the way these have been handled since the beginning in packaging, and reduce our work required as well. It's a minimal change; larger changes could be worked on later if someone wanted to work on a PEP for cleaning this up. But here's my proposal:
Versionvalues must have the "convert to version if possible" behavior. This will allow implementations to fix errors like using"v0"as an extra, and provide a performance boost.uvis doing this anyway.platform_releaseaVersion(could be indicated asstring | Versionin the table to help users realize it will often fail the conversion, but it keeps the legacyVersionbehavior above.)>and<as alwaysFalse, and<=,>=as equivalent to==, for strings and failed Version conversions. This is the legacy (<22) and current (in main) behavior ofpackaging. Python string ordering is never reliable; even if it happens to work going from8to9, it will break on the next release because9is more than10. And this requires that other languages, like Rust, follow Python's rules for string ordering.This should only affect <50 (legacy) packages on GitHub, and it will do the right thing for them as well (making the packaging <22 behavior official). (Most of these have other typos, like 21 instead of 12 for the macOS version, so pretty sure they are dead projects, but it won't break them).
I'd like to do this now, since we are replacing an Error with our old behavior, which was not spec compliant, so packages may start appearing expecting this behavior (again).
📚 Documentation preview 📚: https://python-packaging-user-guide--1971.org.readthedocs.build/en/1971/