Skip to content

Conversation

@shish
Copy link
Collaborator

@shish shish commented Jan 31, 2026

If eg if 8.2/ftp.php and 8.3/ftp.php are identical, delete 8.3/ftp.php and map 8.3 => 8.2

(This reduces package size by 50%)

This was referenced Jan 31, 2026
@shish
Copy link
Collaborator Author

shish commented Jan 31, 2026

cc @JakeQZ does this match with what you were thinking?

I'mma probably leave this up for a few days before merging to give other maintainers a chance to check since it feels like a more significant change, and then do a v3.4 release with this + docs removal + 8.6 support + other odds and ends

@shish
Copy link
Collaborator Author

shish commented Jan 31, 2026

(Also yes we could reduce file size even further by deduplicating at the function level rather than the file level, but that'd be quite a bit more work -- not impossible, just fiddly, and I don't personally have time to work on it right now)

@JakeQZ
Copy link
Contributor

JakeQZ commented Jan 31, 2026

cc @JakeQZ does this match with what you were thinking?

Yes. Though with a large number of files, I'm wondering what the chance of an MD5 collision might be.

(Also yes we could reduce file size even further by deduplicating at the function level rather than the file level, but that'd be quite a bit more work ...)

I also thought that. Deduplicating at the file level looked fairly straightforward and would achieve most of the gains, whereas I could not see an easy way to deduplicate at the function level, and the additional benefit would likely be relatively small.

If eg if `8.2/ftp.php` and `8.3/ftp.php` are identical, delete `8.3/ftp.php` and map `8.3 => 8.2`

(This reduces package size by 50%)
@shish
Copy link
Collaborator Author

shish commented Jan 31, 2026

with a large number of files, I'm wondering what the chance of an MD5 collision might be.

For non-adversarial inputs -- at the current rate of growth (one new PHP version per year, an average of 10 new files per version), it'll take around 25 trillion years for the odds of collision to reach 50/50

On the other hand, if somebody were to intentionally attack this project by sending patches to the PHP documentation team with the goal of creating a hash collision, that'd be within the realm of possibility. Since we're only dealing with a few megabytes of data, I'll remove the hashing and just use the entire file as the array key for guaranteed uniqueness 👀

@OskarStark
Copy link
Collaborator

This is a good achievement, thanks 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants