Conversation
|
cc @JakeQZ does this match with what you were thinking? I'mma probably leave this up for a few days before merging to give other maintainers a chance to check since it feels like a more significant change, and then do a v3.4 release with this + docs removal + 8.6 support + other odds and ends |
|
(Also yes we could reduce file size even further by deduplicating at the function level rather than the file level, but that'd be quite a bit more work -- not impossible, just fiddly, and I don't personally have time to work on it right now) |
Yes. Though with a large number of files, I'm wondering what the chance of an MD5 collision might be.
I also thought that. Deduplicating at the file level looked fairly straightforward and would achieve most of the gains, whereas I could not see an easy way to deduplicate at the function level, and the additional benefit would likely be relatively small. |
If eg if `8.2/ftp.php` and `8.3/ftp.php` are identical, delete `8.3/ftp.php` and map `8.3 => 8.2` (This reduces package size by 50%)
For non-adversarial inputs -- at the current rate of growth (one new PHP version per year, an average of 10 new files per version), it'll take around 25 trillion years for the odds of collision to reach 50/50 On the other hand, if somebody were to intentionally attack this project by sending patches to the PHP documentation team with the goal of creating a hash collision, that'd be within the realm of possibility. Since we're only dealing with a few megabytes of data, I'll remove the hashing and just use the entire file as the array key for guaranteed uniqueness 👀 |
|
This is a good achievement, thanks 🙏 |
If eg if
8.2/ftp.phpand8.3/ftp.phpare identical, delete8.3/ftp.phpand map8.3 => 8.2(This reduces package size by 50%)