Thoughts on hashing

I have been thinking about hashing, and wanted to write down some thoughts on the topic. Comments are welcome, but not expected!

There are, broadly speaking two types of hash functions we might want:

1) A `GenericHash` which, given any GAP variable, can calculate a hash function for it.

2) `ChooseHash`, a way of getting a highly efficient hash function for a particular type of data (this might fall back on `Hash` when it cannot make a more clever decision).

I think `Hash` is always required, I (for example) sometimes hash things like `rec(points := 6, parts := [1,2])`, which I don't think would ever be handled well by an "efficient hash function".

There are other parts of GAP which implement `ChooseHash`, and we have an example (old) PR. There is an implementation of `GenericHash`, but it can't currently be extended.

In an ideal world, I think we want the following:

* An easy way to add new hash functions, which will by default be added to both `GenericHash` and `ChooseHash`, but which can be only added to one or the other in special cases.

The main problem arises with `GenericHash`, and if a new overload is added once a hash object already exists which changes the result for an existing object. This would break these hash containers.

I am unsure if it is reasonable to forbid adding `GenericHash` overloads which change the hash of an object which can already be hashed -- in particular this might get broken if two packages added overloads which could hash the same object.

Therefore, there are 2(ish) "reasonable" ways to fix this.

1) We could provide some efficient way of detecting if an operation has changed -- a hash function could store this, and force rehashing the container if a new method is installed.

2) We could provide a way to clone an operation (I assume this would be done efficiently, so if the operation never changed only one clone would ever be made, and repeatedly handed out) -- then a hash contain could clone `GenericHash` when constructed, and never change hash functions during it's lifetime. This could create strange effects of two hash containers being able to hash different objects, depending on when they were created.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thoughts on hashing #149

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Thoughts on hashing #149

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions