C#: Add a hash-cons library for C# #183
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a hash-cons library for C# as requested by @ropwareJB.
The goal of a hash-consing library is to create an efficient way to compare two expressions without creating a quadratically sized relation. For example, if you were to check if two
VarAccesses are "syntatically equal" you may create this relation:However, this relation would be of size
n^2wherenis the number of accesses of a variable in the program. For example, if you have this program:this relation would include:
syntacticallySimilarthat relatesxon line 1 toxon line 1syntacticallySimilarthat relatesxon line 1 toxon line 2syntacticallySimilarthat relatesxon line 1 toxon line 3syntacticallySimilarthat relatesxon line 2 toxon line 1syntacticallySimilarthat relatesxon line 2 toxon line 2syntacticallySimilarthat relatesxon line 2 toxon line 3syntacticallySimilarthat relatesxon line 3 toxon line 1syntacticallySimilarthat relatesxon line 3 toxon line 2syntacticallySimilarthat relatesxon line 3 toxon line 3i.e., exactly
3^2 = 9rows!To avoid this, hash-consing creates a "unique representation" of the structure of an expression. Crucially, an expression is mapped to exactly 1 hash-cons value. And if two expressions map to the same hash-cons value then they have the same structure. You can then compare whether two expressions are syntactically identical by checking if their hash-cons values are equal. This avoids the quadratic blowup.
I'm sure I have missed a number of expressions 😅 This will result in the library not realizing that two expressions are syntactically similar (because they will each be assigned a hash-cons unique to that expressions and thus they will never be equal to anything but that expression).
To gauge which expressions are most important (and haven't yet been handled by the library) you can run a query such as:
this will output how many expressions we couldn't map to a "useful" hash-cons value for each kind of expression. This is what I've used to "guide" myself on which expressions should be handled in this first PR, but I'm sure there are DBs where the numbers are totally different