-
Notifications
You must be signed in to change notification settings - Fork 8
Description
I have been thinking about hashing, and wanted to write down some thoughts on the topic. Comments are welcome, but not expected!
There are, broadly speaking two types of hash functions we might want:
-
A
GenericHashwhich, given any GAP variable, can calculate a hash function for it. -
ChooseHash, a way of getting a highly efficient hash function for a particular type of data (this might fall back onHashwhen it cannot make a more clever decision).
I think Hash is always required, I (for example) sometimes hash things like rec(points := 6, parts := [1,2]), which I don't think would ever be handled well by an "efficient hash function".
There are other parts of GAP which implement ChooseHash, and we have an example (old) PR. There is an implementation of GenericHash, but it can't currently be extended.
In an ideal world, I think we want the following:
- An easy way to add new hash functions, which will by default be added to both
GenericHashandChooseHash, but which can be only added to one or the other in special cases.
The main problem arises with GenericHash, and if a new overload is added once a hash object already exists which changes the result for an existing object. This would break these hash containers.
I am unsure if it is reasonable to forbid adding GenericHash overloads which change the hash of an object which can already be hashed -- in particular this might get broken if two packages added overloads which could hash the same object.
Therefore, there are 2(ish) "reasonable" ways to fix this.
-
We could provide some efficient way of detecting if an operation has changed -- a hash function could store this, and force rehashing the container if a new method is installed.
-
We could provide a way to clone an operation (I assume this would be done efficiently, so if the operation never changed only one clone would ever be made, and repeatedly handed out) -- then a hash contain could clone
GenericHashwhen constructed, and never change hash functions during it's lifetime. This could create strange effects of two hash containers being able to hash different objects, depending on when they were created.