Skip to content

Conversation

@rstam
Copy link
Contributor

@rstam rstam commented May 27, 2025

No description provided.

@rstam rstam requested review from adelinowona and sanych-sun May 27, 2025 18:33
@rstam rstam added bug Fixes issues or unintended behavior. improvement Optimizations or refactoring (no new features or fixes). labels May 27, 2025
@rstam rstam changed the title CSHARP-5572: Implement new KnownSerializerFinder. CSHARP-5572: Implement new KnownSerializerFinder May 27, 2025
@Szmiglo
Copy link

Szmiglo commented Jul 23, 2025

hi @rstam, what's the status of this effort?

@rstam
Copy link
Contributor Author

rstam commented Jul 23, 2025

hi @rstam, what's the status of this effort?

It is progressing well, but it is a large effort. I work on this whenever I'm not working on some smaller higher priority task.

@rstam rstam requested review from a team and removed request for a team August 21, 2025 15:25
@rstam rstam force-pushed the csharp5572 branch 2 times, most recently from 86fbb24 to 56a464c Compare October 13, 2025 15:15
}
}

if (_map.TryGetValue(node, out var existingSerializer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the duplicate check to the beginning of the AddSerializer method to implement fail-fast behavior. This would avoid unnecessary serializer transformation work when a serializer already exists for the node.

However, I notice the current implementation places this check at the end, likely to preserve information about the conflicting serializer for the exception. While this debugging information could be helpful, I question just how useful it is. Knowing where a duplicate attempt happened seems more useful knowing what the conflicting serializer would have been.

I lean toward the fail-fast approach since the conflicting serializer details seem like a "nice-to-have" rather than essential debugging information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move the error check further up then we have to remove information about the redundant serializer from the error message.

Also, I don't think we need to worry about failing fast because we actually never expect to hit this error. This check is here to help us find bugs. In the absence of bugs this exception will never be thrown.

private bool _isMakingProgress = true;
private readonly KnownSerializerMap _knownSerializers;
private int _oldKnownSerializersCount = 0;
private int _pass = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is never used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

var newKnownSerializersCount = _knownSerializers.Count;
if (newKnownSerializersCount == _oldKnownSerializersCount)
{
if (_useDefaultSerializerForConstants)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe make use of a state enum as such:

private enum DiscoveryPhase
{
    InferringFromContext,      // Not using defaults
    UsingDefaultSerializers,   // Last attempt with defaults
    Complete                    // No more progress possible
}

to replace the use of all these booleans used in keeping track of the state of the visitor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somewhat reasonable, but I think the booleans work better here.

There are only 2 booleans.

One issue is that if we ever had more states the order of the enum values could get very confusing and maybe even no order is possible.

@rstam rstam force-pushed the csharp5572 branch 2 times, most recently from 32aae48 to 7f2a64c Compare October 31, 2025 15:58
@sanych-sun sanych-sun marked this pull request as ready for review November 25, 2025 22:22
@sanych-sun sanych-sun requested a review from a team as a code owner November 25, 2025 22:22
Copilot AI review requested due to automatic review settings November 25, 2025 22:22
Copy link
Member

@sanych-sun sanych-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review.


bool IsTupleOrValueTuple(Type type)
{
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For net6 and higher there is a better way to check: both Tuple and ValueTuple implements ITuple interface. I suppose we can have #if def here and use interface approach for modern targets, when have something like the proposed code for net472.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to know!

Is there any danger of false positives if unexpected classes also happen to implement ITuple?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not doing just yet until we have a discussion about false positives.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a new KnownSerializerFinder system to replace the previous approach of dynamically determining serializers during expression translation. The new system pre-analyzes expressions to build a map of known serializers for each expression node, enabling more reliable and consistent serializer resolution throughout the translation process.

Key Changes:

  • Introduced a comprehensive KnownSerializerFinder infrastructure that pre-computes serializers for expression nodes
  • Updated TranslationContext to accept and use the pre-computed serializer map
  • Refactored multiple translators to leverage the new known serializer system
  • Added several new serializer types to support the enhanced type inference

Reviewed changes

Copilot reviewed 98 out of 99 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
KnownSerializerFinder*.cs New visitor-based system for discovering and mapping serializers to expression nodes
TranslationContext.cs Extended to store and provide access to known serializers map
LinqProviderAdapter.cs Updated all translation entry points to use the new KnownSerializerFinder
*Serializer.cs Added new serializer types (Polymorphic, Upcasting, Unknowable, etc.) to support enhanced type inference
Test files Updated test contexts to use new TranslationContext.Create signatures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 27 to 33
DeduceSerialiers();
base.VisitListInit(node);
DeduceSerialiers();

return node;

void DeduceSerialiers()
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'DeduceSerialiers' to 'DeduceSerializers'.

Suggested change
DeduceSerialiers();
base.VisitListInit(node);
DeduceSerialiers();
return node;
void DeduceSerialiers()
DeduceSerializers();
base.VisitListInit(node);
DeduceSerializers();
return node;
void DeduceSerializers()

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 27 to 33
DeduceConditionaSerializers();
base.VisitConditional(node);
DeduceConditionaSerializers();

return node;

void DeduceConditionaSerializers()
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'DeduceConditionaSerializers' to 'DeduceConditionalSerializers'.

Suggested change
DeduceConditionaSerializers();
base.VisitConditional(node);
DeduceConditionaSerializers();
return node;
void DeduceConditionaSerializers()
DeduceConditionalSerializers();
base.VisitConditional(node);
DeduceConditionalSerializers();
return node;
void DeduceConditionalSerializers()

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return sourceSerializer;
}

// handle conversionsn to BsonValue before any others
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'conversionsn' to 'conversions'.

Suggested change
// handle conversionsn to BsonValue before any others
// handle conversion to BsonValue before any others

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

_knownSerializers = knownSerializers;
}

public Expression ExpressionWithUnknownSerialier => _expressionWithUnknownSerializer;
Copy link

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'ExpressionWithUnknownSerialier' to 'ExpressionWithUnknownSerializer'.

Suggested change
public Expression ExpressionWithUnknownSerialier => _expressionWithUnknownSerializer;
public Expression ExpressionWithUnknownSerializer => _expressionWithUnknownSerializer;

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}
""";
if (!Feature.FilterLimit.IsSupported(CoreTestConfiguration.MaxWireVersion))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually create two strings depending on whether a feature is supported or not but, in this case, the MQL is so long it's easier to just remove limit : 1 if it is not supported.

Copy link
Member

@sanych-sun sanych-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also review all TODOs (remove, address or create a ticket for future if we can postpone the changes) and failing unit-test.

}

if (IsSymmetricalBinaryOperator(@operator) &&
CanDeduceSerializer(leftExpression, rightExpression, out var unknownNode, out var otherNodeSerializer))
Copy link
Member

@sanych-sun sanych-sun Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I found only 2 usages of CanDeduceSerializer, both looks similar (TODO in the another usage says we have to check value types). I think we can simplify by refactoring to TryDeduceSerializer and do the type check and AddNodeSerializer inside it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the CanDeduceSerializer method entirely becase there was another existing method that could be used.

}
else
{
DeduceUnknowableSerializer(node);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worth to add a comment why we decided stop looking for the serializer for the node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

internal class IgnoreNodeSerializer<TValue> : SerializerBase<TValue>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the difference between IgnoreNodeSerializer and UnknowableSerializer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IgnoreSerializer is when we know what the node means but there is no serializer needed for the node. But since we throw an exception if a node has no serializer we assign this dummy serializer to such nodes.

UnknowableSerializer is when we know what the node means in general, but not in this particular case. An example would be a method call expression with an unknown method (we know about method call expressions in general but can encounter methods we don't know anything about).

UnknowableSerializer comes into play with client-side projections. If we didn't assign the UnknowableSerializer dummy serializer to method call nodes we didn't know about we would throw a missing serializer exception which would prevent us from getting to the client-side projection.

protected override Expression VisitListInit(ListInitExpression node)
{
var newExpression = node.NewExpression;
var initializers = node.Initializers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initializers variable seems to be unused. Should we handle initializers somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure yet. Added a TODO comment for later investigation.

var containingExpression = node.Expression;
if (IsKnown(containingExpression, out containingSerializer))
{
// TODO: handle special cases like DateTime.Year etc.
Copy link
Member

@sanych-sun sanych-sun Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO still actual?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't know the full list.

I've edited the comment to not mention DateTime.Year (since that one is already done)


bool IsDictionaryContainsKeyMethod(out Expression keyExpression)
{
if (method.DeclaringType.Name.Contains("Dictionary") &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this method to DictionaryMethod helper class and avoid Contains("Dictionary") by checking for interface or base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


void DeduceAggregateMethodSerializers()
{
if (method.IsOneOf(__aggregateMethods))
Copy link
Member

@sanych-sun sanych-sun Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find any usage of results of IsItemSerializerKnown method call results, so it probably could be removed. And if it's removed then outer if is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this one line.

I don't understand why you think the outer if is redundant.

}
}

void DeduceAsinhMethodSerializers()
Copy link
Member

@sanych-sun sanych-sun Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have number of exactly the same methods in this class to process all math trigonometry methods, should we group then into a single collection __trigonometricMathMethods and process in the single method like:

     void DeduceTrigonometryMethodSerializers()
        {
            if (method.Is(__trigonometricMathMethods))
            {
                DeduceReturnsDoubleSerializer();
            }
            else
            {
                DeduceUnknownMethodSerializer();
            }
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

var isNullable = toType.IsNullable();
var valueType = isNullable ? Nullable.GetUnderlyingType(toType) : toType;

var valueSerializer = (IBsonSerializer)(Type.GetTypeCode(valueType) switch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks very similar to StandardSerializers.GetSerializer. Can we leverage it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO for future investigation.

There are 11 types handled here that StandardSerializers doesn't know about and a few of the ones it does know about are configured differentlly here, so it's tricky.


case ExpressionType.NewArrayInit:
DeduceNewArrayInitSerializers();
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw in default case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I don't think it's necessary.

First of all, there are no other cases here AFAIK.

And if there are, it will result in a missing serializer exception a bit later on.

I don't think we normally throw exceptions when deducing things in the SerializerFinder. We either deduce something or we do nothing.

Copy link
Contributor Author

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments and requested changes.

}

if (IsSymmetricalBinaryOperator(@operator) &&
CanDeduceSerializer(leftExpression, rightExpression, out var unknownNode, out var otherNodeSerializer))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the CanDeduceSerializer method entirely becase there was another existing method that could be used.

{
_ when declaringType == typeof(BsonValue) => GetBsonValuePropertySerializer(),
_ when IsCollectionCountOrLengthProperty() => GetCollectionCountOrLengthPropertySerializer(),
_ when declaringType == typeof(DateTime) => GetDateTimePropertySerializer(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not until we add support for DateTimeOffset in general.

No tests are failing because we didn't do this.

var containingExpression = node.Expression;
if (IsKnown(containingExpression, out containingSerializer))
{
// TODO: handle special cases like DateTime.Year etc.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I don't know the full list.

I've edited the comment to not mention DateTime.Year (since that one is already done)


return memberName switch
{
"Keys" => ICollectionSerializer.Create(keySerializer),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is surprising.

It's because Dictionary<TKey, TValue>.Keys and IDictionary<TKey, TValue>.Keys have different types. Same for Values.

An unexpected glitch in Microsoft's design of these types. (Constrained by having to maintain backward compatibility).


internal partial class SerializerFinderVisitor
{
private static readonly HashSet<MethodInfo> __absMethods =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to MathMethod.cs from where it can be shared.

}
}

void DeduceAsinhMethodSerializers()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

var isNullable = toType.IsNullable();
var valueType = isNullable ? Nullable.GetUnderlyingType(toType) : toType;

var valueSerializer = (IBsonSerializer)(Type.GetTypeCode(valueType) switch
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO for future investigation.

There are 11 types handled here that StandardSerializers doesn't know about and a few of the ones it does know about are configured differentlly here, so it's tricky.


bool IsDictionaryContainsKeyMethod(out Expression keyExpression)
{
if (method.DeclaringType.Name.Contains("Dictionary") &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


case ExpressionType.NewArrayInit:
DeduceNewArrayInitSerializers();
break;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but I don't think it's necessary.

First of all, there are no other cases here AFAIK.

And if there are, it will result in a missing serializer exception a bit later on.

I don't think we normally throw exceptions when deducing things in the SerializerFinder. We either deduce something or we do nothing.

}
}

internal class IgnoreNodeSerializer<TValue> : SerializerBase<TValue>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IgnoreSerializer is when we know what the node means but there is no serializer needed for the node. But since we throw an exception if a node has no serializer we assign this dummy serializer to such nodes.

UnknowableSerializer is when we know what the node means in general, but not in this particular case. An example would be a method call expression with an unknown method (we know about method call expressions in general but can encounter methods we don't know anything about).

UnknowableSerializer comes into play with client-side projections. If we didn't assign the UnknowableSerializer dummy serializer to method call nodes we didn't know about we would throw a missing serializer exception which would prevent us from getting to the client-side projection.

@rstam rstam requested a review from sanych-sun December 13, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Fixes issues or unintended behavior. improvement Optimizations or refactoring (no new features or fixes).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants