Skip to content

Conversation

@vstinner
Copy link
Contributor

@vstinner vstinner commented Dec 4, 2025

No description provided.

Comment on lines 122 to 123
INADA-san wrote that most users either overestimate its effectiveness or don't
fully understand how it operates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this apply to some other proposals as well?
How do the other functions behave if input contains duplicate keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this apply to some other proposals as well?

I don't know, I don't want to speak for @methane.

How do the other functions behave if input contains duplicate keys?

So far, all proposed functions allocates N items even if there are duplicate keys.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some points that many people using the private API don't know:
python/cpython#139772 (comment)

Until Python 3.5, resizing the dict is done by reinserting all the elements into a new hash table. It was slow.
Python 3.6 separated the hash table and the entry array. Since then, hash table reconstruction is fast, and the entry array is copied with memcpy.
People who evaluated the effect of _PyDict_NewPresized() with microbenchmarks before Python 3.6 overestimated the effect nowadays.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ported @vstinner 's benchmark to Python 3.5.

https://github.com/methane/notes/tree/main/2025/dictnew

Python 3.5

Benchmark dict_new dict_presized
dict-1 454 ns 398 ns: 1.14x faster
dict-5 2.07 us 1.73 us: 1.20x faster
dict-10 3.96 us 3.28 us: 1.21x faster
dict-25 9.25 us 7.56 us: 1.22x faster
dict-100 31.4 us 25.6 us: 1.22x faster
dict-500 113 us 103 us: 1.11x faster
dict-1,000 218 us 200 us: 1.09x faster
Geometric mean (ref) 1.17x faster

Python 3.12

Benchmark dict_new dict_presized
dict-1 378 ns 337 ns: 1.12x faster
dict-5 1.49 us 1.34 us: 1.11x faster
dict-10 2.65 us 2.32 us: 1.14x faster
dict-25 5.84 us 5.12 us: 1.14x faster
dict-100 22.0 us 18.0 us: 1.22x faster
dict-500 98.6 us 87.6 us: 1.13x faster
dict-1,000 194 us 172 us: 1.13x faster
Geometric mean (ref) 1.14x faster
  • PyDict_New() in Python 3.12 is faster than _PyDict_NewPresized() in 3.5
  • _PyDict_NewPresized() vs PyDict_New() ratio become little small, but still significant.

Comment on lines 165 to 166
Such function lacks an *override* argument to decide how to deal with
overridden keys on updating an existing dictionary.
Copy link
Contributor

@encukou encukou Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we add an override argument, there's no downside?

Copy link
Contributor Author

@vstinner vstinner Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I prefer PyDict_FromItems() to create a dictionary :-) PyDict_FromItems() has less parameters and so is simpler.

@vstinner
Copy link
Contributor Author

I plan to merge this change next Friday, unless someone prefers to iterate on this PR.

I completed the document to "Unicode issue" and "False header problem" sections. I also added PyDict_SetAssumptions() API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants