Skip to content

Thoughts on hashing #149

@ChrisJefferson

Description

@ChrisJefferson

I have been thinking about hashing, and wanted to write down some thoughts on the topic. Comments are welcome, but not expected!

There are, broadly speaking two types of hash functions we might want:

  1. A GenericHash which, given any GAP variable, can calculate a hash function for it.

  2. ChooseHash, a way of getting a highly efficient hash function for a particular type of data (this might fall back on Hash when it cannot make a more clever decision).

I think Hash is always required, I (for example) sometimes hash things like rec(points := 6, parts := [1,2]), which I don't think would ever be handled well by an "efficient hash function".

There are other parts of GAP which implement ChooseHash, and we have an example (old) PR. There is an implementation of GenericHash, but it can't currently be extended.

In an ideal world, I think we want the following:

  • An easy way to add new hash functions, which will by default be added to both GenericHash and ChooseHash, but which can be only added to one or the other in special cases.

The main problem arises with GenericHash, and if a new overload is added once a hash object already exists which changes the result for an existing object. This would break these hash containers.

I am unsure if it is reasonable to forbid adding GenericHash overloads which change the hash of an object which can already be hashed -- in particular this might get broken if two packages added overloads which could hash the same object.

Therefore, there are 2(ish) "reasonable" ways to fix this.

  1. We could provide some efficient way of detecting if an operation has changed -- a hash function could store this, and force rehashing the container if a new method is installed.

  2. We could provide a way to clone an operation (I assume this would be done efficiently, so if the operation never changed only one clone would ever be made, and repeatedly handed out) -- then a hash contain could clone GenericHash when constructed, and never change hash functions during it's lifetime. This could create strange effects of two hash containers being able to hash different objects, depending on when they were created.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions