Skip to content

Conversation

@MathiasVP
Copy link
Collaborator

@MathiasVP MathiasVP commented Apr 3, 2025

This PR adds a hash-cons library for C# as requested by @ropwareJB.

The goal of a hash-consing library is to create an efficient way to compare two expressions without creating a quadratically sized relation. For example, if you were to check if two VarAccesses are "syntatically equal" you may create this relation:

predicate syntacticallySimilar(VarAccess va1, VarAccess va2) {
  va1.getTarget() = va2.getTarget()
}

However, this relation would be of size n^2 where n is the number of accesses of a variable in the program. For example, if you have this program:

Use(x); // 1
Use(x); // 2
Use(x); // 3

this relation would include:

  • A row in syntacticallySimilar that relates x on line 1 to x on line 1
  • A row in syntacticallySimilar that relates x on line 1 to x on line 2
  • A row in syntacticallySimilar that relates x on line 1 to x on line 3
  • A row in syntacticallySimilar that relates x on line 2 to x on line 1
  • A row in syntacticallySimilar that relates x on line 2 to x on line 2
  • A row in syntacticallySimilar that relates x on line 2 to x on line 3
  • A row in syntacticallySimilar that relates x on line 3 to x on line 1
  • A row in syntacticallySimilar that relates x on line 3 to x on line 2
  • A row in syntacticallySimilar that relates x on line 3 to x on line 3

i.e., exactly 3^2 = 9 rows!

To avoid this, hash-consing creates a "unique representation" of the structure of an expression. Crucially, an expression is mapped to exactly 1 hash-cons value. And if two expressions map to the same hash-cons value then they have the same structure. You can then compare whether two expressions are syntactically identical by checking if their hash-cons values are equal. This avoids the quadratic blowup.

I'm sure I have missed a number of expressions 😅 This will result in the library not realizing that two expressions are syntactically similar (because they will each be assigned a hash-cons unique to that expressions and thus they will never be equal to anything but that expression).

To gauge which expressions are most important (and haven't yet been handled by the library) you can run a query such as:

from string s, int k
where
  k =
    strictcount(Expr e | hashCons(e) instanceof TUniqueHashCons and s = e.getPrimaryQlClasses() | e)
select s, k

this will output how many expressions we couldn't map to a "useful" hash-cons value for each kind of expression. This is what I've used to "guide" myself on which expressions should be handled in this first PR, but I'm sure there are DBs where the numbers are totally different

@MathiasVP MathiasVP marked this pull request as draft April 3, 2025 16:39
@MathiasVP MathiasVP force-pushed the hashcons-for-csharp branch from b170992 to 6125973 Compare April 3, 2025 17:13
@MathiasVP MathiasVP marked this pull request as ready for review April 3, 2025 17:14
@MathiasVP
Copy link
Collaborator Author

CI is unhappy because we merged #185 (which is necessary for the 2.21.0 upgrade, but it won't compile until then)

@MathiasVP MathiasVP merged commit 02c027d into main Apr 9, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants