-
Notifications
You must be signed in to change notification settings - Fork 2
Proposed API: add PyDict_FromItems() #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Proposed_API/pydict_fromitems.rst
Outdated
| INADA-san wrote that most users either overestimate its effectiveness or don't | ||
| fully understand how it operates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this apply to some other proposals as well?
How do the other functions behave if input contains duplicate keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this apply to some other proposals as well?
I don't know, I don't want to speak for @methane.
How do the other functions behave if input contains duplicate keys?
So far, all proposed functions allocates N items even if there are duplicate keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote some points that many people using the private API don't know:
python/cpython#139772 (comment)
Until Python 3.5, resizing the dict is done by reinserting all the elements into a new hash table. It was slow.
Python 3.6 separated the hash table and the entry array. Since then, hash table reconstruction is fast, and the entry array is copied with memcpy.
People who evaluated the effect of _PyDict_NewPresized() with microbenchmarks before Python 3.6 overestimated the effect nowadays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ported @vstinner 's benchmark to Python 3.5.
https://github.com/methane/notes/tree/main/2025/dictnew
Python 3.5
| Benchmark | dict_new | dict_presized |
|---|---|---|
| dict-1 | 454 ns | 398 ns: 1.14x faster |
| dict-5 | 2.07 us | 1.73 us: 1.20x faster |
| dict-10 | 3.96 us | 3.28 us: 1.21x faster |
| dict-25 | 9.25 us | 7.56 us: 1.22x faster |
| dict-100 | 31.4 us | 25.6 us: 1.22x faster |
| dict-500 | 113 us | 103 us: 1.11x faster |
| dict-1,000 | 218 us | 200 us: 1.09x faster |
| Geometric mean | (ref) | 1.17x faster |
Python 3.12
| Benchmark | dict_new | dict_presized |
|---|---|---|
| dict-1 | 378 ns | 337 ns: 1.12x faster |
| dict-5 | 1.49 us | 1.34 us: 1.11x faster |
| dict-10 | 2.65 us | 2.32 us: 1.14x faster |
| dict-25 | 5.84 us | 5.12 us: 1.14x faster |
| dict-100 | 22.0 us | 18.0 us: 1.22x faster |
| dict-500 | 98.6 us | 87.6 us: 1.13x faster |
| dict-1,000 | 194 us | 172 us: 1.13x faster |
| Geometric mean | (ref) | 1.14x faster |
PyDict_New()in Python 3.12 is faster than_PyDict_NewPresized()in 3.5_PyDict_NewPresized()vsPyDict_New()ratio become little small, but still significant.
Proposed_API/pydict_fromitems.rst
Outdated
| Such function lacks an *override* argument to decide how to deal with | ||
| overridden keys on updating an existing dictionary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we add an override argument, there's no downside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I prefer PyDict_FromItems() to create a dictionary :-) PyDict_FromItems() has less parameters and so is simpler.
|
I plan to merge this change next Friday, unless someone prefers to iterate on this PR. I completed the document to "Unicode issue" and "False header problem" sections. I also added PyDict_SetAssumptions() API. |
No description provided.