Skip to content

Refactor CollectionModel/builder and introduce read-only property interfaces#13699

Open
roji wants to merge 5 commits intomicrosoft:mainfrom
roji:CollectionModelRefactor
Open

Refactor CollectionModel/builder and introduce read-only property interfaces#13699
roji wants to merge 5 commits intomicrosoft:mainfrom
roji:CollectionModelRefactor

Conversation

@roji
Copy link
Member

@roji roji commented Mar 23, 2026

Changes

  • Extract ConfigureVectorPropertyEmbedding(): Deduplicated embedding resolution logic that was repeated across ProcessTypeProperties and ProcessRecordDefinition.
  • Replace IRecordCreator with Func<object>: Removed IRecordCreator interface and two implementing classes (ActivatorBasedRecordCreator, DynamicRecordCreator), replacing them with simple lambdas.
  • Rename TemporaryStorageNameSerializedKeyName: Moved from PropertyModel base to KeyPropertyModel where it belongs (only used by CosmosNoSql for key property JSON serializer name remapping).
  • Delegate-based property accessors: Replaced virtual GetValueAsObject/SetValueAsObject overrides with delegate fields (_getter/_setter), configured via ConfigurePocoAccessors()/ConfigureDynamicAccessors(). Converted null-coalescing throws to Debug.Assert.
  • Improved xmldoc on VectorPropertyModel.EmbeddingType explaining the [AllowNull] invariant.
  • Introduce read-only interfaces (IPropertyModel, IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel): CollectionModel now exposes IReadOnlyList<IVectorPropertyModel> etc., giving providers an immutable view post-build. All provider code updated to consume interface types.

Note that the last is a non-trivial breaking change for providers (not users), which I think is fine (note that the provider-facing APIs are flagged as [Experimental]).

@roji roji force-pushed the CollectionModelRefactor branch from a49d45d to 9f5b23c Compare March 23, 2026 21:57
@roji roji added this to the MEVD Finalization milestone Mar 23, 2026
@roji roji force-pushed the CollectionModelRefactor branch from 9f5b23c to 3ed77b9 Compare March 23, 2026 22:24
@roji roji force-pushed the CollectionModelRefactor branch from 3ed77b9 to abcdb5a Compare March 24, 2026 10:44
roji and others added 2 commits March 24, 2026 11:44
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@roji roji force-pushed the CollectionModelRefactor branch from abcdb5a to 9bfbb38 Compare March 24, 2026 10:44
@roji roji marked this pull request as ready for review March 24, 2026 17:56
@roji roji requested a review from a team as a code owner March 24, 2026 17:56
@roji roji requested review from adamsitnik, Copilot and westey-m March 24, 2026 17:56
@roji roji enabled auto-merge March 24, 2026 17:59
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 86%

✓ Correctness

This is a well-structured refactoring that introduces read-only interfaces (IPropertyModel, IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel) over the concrete property model classes, with all connectors updated to consume the interfaces. It also replaces the 'TemporaryStorageName' workaround with a proper 'SerializedKeyName' on IKeyPropertyModel, switches from virtual method-based property accessors to delegate-based ones configured during model building, and extracts duplicated embedding configuration logic into ConfigureVectorPropertyEmbedding. The changes are mechanically consistent, type-safe (leveraging IReadOnlyList covariance), and preserve existing behavior. No blocking correctness issues were found.

✗ Security Reliability

This PR introduces read-only interfaces (IPropertyModel, IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel) over the existing concrete property model classes and refactors accessors from virtual methods with inline null-guarding into delegate-based accessors configured during model building. The main reliability concern is that the new GetValueAsObject/SetValueAsObject methods guard the _getter/_setter delegates with Debug.Assert (stripped in Release builds), meaning a misconfigured property will produce an opaque NullReferenceException in production instead of a descriptive error. Additionally, the Activator.CreateInstance(recordType) call uses null-suppression without an explicit null check, though this is low-risk since model building validates the parameterless constructor exists. The rest of the changes are mechanical type signature updates from concrete to interface types, which look correct and pose no security or reliability risks.

✓ Test Coverage

This PR introduces read-only interfaces (IPropertyModel, IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel) for property models, refactors property accessors from inline virtual methods to configured delegates (ConfigurePocoAccessors/ConfigureDynamicAccessors), renames TemporaryStorageName to SerializedKeyName on KeyPropertyModel, replaces IRecordCreator with a simple Func, and extracts ConfigureVectorPropertyEmbedding. Existing tests cover the embedding generation refactoring and round-trip mapper behavior well. However, there are no direct unit tests for the new accessor delegate configuration pattern, which is a meaningful behavioral change: if accessors fail to be configured during model building, GetValueAsObject/SetValueAsObject will silently fail via Debug.Assert (no-op in release). The conformance and integration tests provide indirect coverage, but given the centrality of property access to the entire VectorData stack, targeted unit tests for the new accessor setup would reduce regression risk.

✗ Design Approach

The PR introduces IPropertyModel/IKeyPropertyModel/IDataPropertyModel/IVectorPropertyModel interfaces as read-only consumer-facing views of the concrete property model classes, updating all connectors to depend on interfaces rather than concrete types. The direction is sound and correctly separates the builder-facing mutable API from the connector-facing read API. Two design issues stand out: (1) IPropertyModel.ProviderAnnotations is typed as the mutable Dictionary<string, object?>? on an interface described as a 'read-only view', which lets any code holding only an IPropertyModel reference corrupt model state by modifying annotations; it should be IReadOnlyDictionary<string, object?>?. (2) The Debug.Assert message in SqlitePropertyMapping.cs was changed to "property is VectorStoreRecordIKeyPropertyModel" which is not a real type name and is misleading. Everything else — the interface surface, the Func<object> factory replacement, the SerializedKeyName rename, and the ConfigurePocoAccessors/ConfigureDynamicAccessors split — is well-structured.

Flagged Issues

  • In PropertyModel.GetValueAsObject and SetValueAsObject, Debug.Assert is used to check that _getter/_setter are non-null, but Debug.Assert is compiled out in Release builds. If accessors are not configured (e.g. due to a bug in a custom model builder subclass or future refactoring), production code will throw an unhelpful NullReferenceException instead of a descriptive error. Replace with a runtime guard (e.g. throw InvalidOperationException).
  • IPropertyModel.ProviderAnnotations is declared as mutable Dictionary<string, object?>? on an interface described as a 'read-only view'. Any connector holding only an IPropertyModel reference can freely mutate the annotations dictionary, corrupting shared model state. This should be IReadOnlyDictionary<string, object?>? on the interface; the concrete PropertyModel class can keep the mutable Dictionary for builder/extension-method use.

Suggestions

  • Add unit tests for ConfigurePocoAccessors and ConfigureDynamicAccessors verifying that GetValueAsObject and SetValueAsObject work correctly after Build() configures the delegates, including the InvalidCastException path for type mismatches. This is the most significant behavioral refactoring in the PR and currently has no direct test coverage.
  • The test class rename from PropertyModelTests to IPropertyModelTests is a naming oddity since the tests still exercise concrete DataPropertyModel/KeyPropertyModel types. Consider keeping the original name or expanding the tests to also cover accessor methods (GetValueAsObject/SetValueAsObject) with both POCO and dynamic accessors configured.
  • The Debug.Assert message in SqlitePropertyMapping.cs was changed to "property is VectorStoreRecordIKeyPropertyModel" which is not a real type name. Use the actual interface name IKeyPropertyModel or a plain description like "Expected a key property".
  • In CollectionModelBuilder.cs, the extracted ConfigureVectorPropertyEmbedding now uses vectorProperty.Type instead of definitionVectorProperty.Type, which is an improvement for CLR-backed properties. Consider adding a unit test covering the scenario where a VectorStoreCollectionDefinition omits Type for a CLR-backed property to confirm this fix.
  • The factory lambda () => Activator.CreateInstance(recordType)! uses null-suppression. While the parameterless constructor is validated before this point, a defensive null check would be more robust.
  • Consider adding a test in CollectionModelBuilderTests that asserts SerializedKeyName is populated correctly after building a model with a reserved key storage name, since the TemporaryStorageName → SerializedKeyName rename is otherwise only tested indirectly.

Automated review by roji's agents

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the VectorData provider model surface to expose immutable, read only property model interfaces after build, while simplifying model construction and deduplicating embedding resolution logic across builder paths.

Changes:

  • Introduces IPropertyModel plus specialized read only interfaces (IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel) and updates connectors and tests to consume them.
  • Refactors PropertyModel accessors to use configured delegates and updates CollectionModel record creation to use a factory delegate instead of IRecordCreator.
  • Extracts ConfigureVectorPropertyEmbedding() to centralize embedding generator and embedding type resolution during model building.

Reviewed changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
dotnet/test/VectorData/VectorData.UnitTests/PropertyModelTests.cs Renames test class to reflect the new interface based surface.
dotnet/test/VectorData/SqliteVec.UnitTests/SqlitePropertyMappingTests.cs Updates tests to use IPropertyModel collections.
dotnet/test/VectorData/Redis.UnitTests/RedisCollectionCreateMappingTests.cs Updates tests to use IPropertyModel[].
dotnet/test/VectorData/PgVector.UnitTests/PostgresPropertyMappingTests.cs Updates tests to use List<IPropertyModel>.
dotnet/src/VectorData/Weaviate/WeaviateQueryBuilder.cs Switches helper signatures to IVectorPropertyModel and IDataPropertyModel.
dotnet/src/VectorData/Weaviate/WeaviateMapper.cs Switches vector population helper to IVectorPropertyModel.
dotnet/src/VectorData/Weaviate/WeaviateCollection.cs Switches search vector generation signature to IVectorPropertyModel.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/VectorPropertyModel.cs Implements IVectorPropertyModel and clarifies EmbeddingType contract.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/PropertyModel.cs Implements IPropertyModel and moves to delegate based accessors.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/KeyPropertyModel.cs Implements IKeyPropertyModel and adds SerializedKeyName.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IVectorPropertyModel.cs Adds read only interface for vector property models.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IRecordCreator.cs Removes the record creator abstraction.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IPropertyModel.cs Adds read only interface for common property model behavior.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IKeyPropertyModel.cs Adds read only interface for key property models.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IDataPropertyModel.cs Adds read only interface for data property models.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/Filter/FilterTranslatorBase.cs Updates binding API to output IPropertyModel.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/DataPropertyModel.cs Implements IDataPropertyModel.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionModelBuilder.cs Deduplicates embedding configuration and switches to factory delegate creation.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionModel.cs Exposes interface typed property lists and uses a record factory.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionJsonModelBuilder.cs Uses SerializedKeyName for reserved key serializer name remapping.
dotnet/src/VectorData/SqliteVec/SqlitePropertyMapping.cs Updates mapping logic to consume interface based property models.
dotnet/src/VectorData/SqliteVec/SqliteFilterTranslator.cs Updates filter translator signature to IPropertyModel.
dotnet/src/VectorData/SqliteVec/SqliteCommandBuilder.cs Updates command builder to use IVectorPropertyModel dictionary keys and interface lists.
dotnet/src/VectorData/SqliteVec/SqliteCollection.cs Updates embedding generation bookkeeping to interface keyed dictionaries.
dotnet/src/VectorData/SqlServer/SqlServerMapper.cs Updates mapper helper to take IPropertyModel.
dotnet/src/VectorData/SqlServer/SqlServerFilterTranslator.cs Updates translator APIs to take IPropertyModel.
dotnet/src/VectorData/SqlServer/SqlServerCommandBuilder.cs Updates command builder signatures and pattern matching to interface models.
dotnet/src/VectorData/SqlServer/SqlServerCollection.cs Updates embedding generation bookkeeping to interface keyed dictionaries.
dotnet/src/VectorData/Redis/RedisJsonMapper.cs Updates property concatenation to use IPropertyModel.
dotnet/src/VectorData/Redis/RedisCollectionSearchMapping.cs Updates query build APIs to use IVectorPropertyModel.
dotnet/src/VectorData/Redis/RedisCollectionCreateMapping.cs Updates schema mapping to consume IPropertyModel and derived interfaces.
dotnet/src/VectorData/Qdrant/QdrantMapper.cs Updates mapping helpers to use IPropertyModel and IVectorPropertyModel.
dotnet/src/VectorData/Qdrant/QdrantCollectionCreateMapping.cs Updates mapping APIs to consume IVectorPropertyModel.
dotnet/src/VectorData/Qdrant/QdrantCollection.cs Updates search vector conversion signature to IVectorPropertyModel.
dotnet/src/VectorData/Pinecone/PineconeFilterTranslator.cs Updates translation helper to take IPropertyModel.
dotnet/src/VectorData/Pinecone/PineconeCollection.cs Updates distance function mapping to consume IVectorPropertyModel.
dotnet/src/VectorData/PgVector/PostgresSqlBuilder.cs Updates SQL builder to use interface models and interface keyed embeddings map.
dotnet/src/VectorData/PgVector/PostgresPropertyMapping.cs Updates type mapping and index info to consume interface models.
dotnet/src/VectorData/PgVector/PostgresPropertyExtensions.cs Updates extensions to target interface models.
dotnet/src/VectorData/PgVector/PostgresMapper.cs Updates property population helper to take IPropertyModel.
dotnet/src/VectorData/PgVector/PostgresFilterTranslator.cs Updates translator signature to take IPropertyModel.
dotnet/src/VectorData/PgVector/PostgresCollection.cs Updates embedding generation bookkeeping and search conversion to interface models.
dotnet/src/VectorData/MongoDB/MongoFilterTranslator.cs Updates translation helper to take IPropertyModel.
dotnet/src/VectorData/MongoDB/MongoCollectionCreateMapping.cs Updates index mapping APIs to consume interface typed property lists.
dotnet/src/VectorData/MongoDB/MongoCollection.cs Updates search vector conversion signature to IVectorPropertyModel.
dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlMapper.cs Updates key rename behavior to use SerializedKeyName.
dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlFilterTranslator.cs Updates property access generation to take IPropertyModel.
dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlDynamicMapper.cs Updates dynamic mapping switch patterns to interface models.
dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlCollectionQueryBuilder.cs Updates projection building to filter on IVectorPropertyModel.
dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlCollection.cs Updates partition key property storage to IPropertyModel list.
dotnet/src/VectorData/CosmosMongoDB/CosmosMongoFilterTranslator.cs Updates translation helper to take IPropertyModel.
dotnet/src/VectorData/CosmosMongoDB/CosmosMongoCollectionCreateMapping.cs Updates index mapping APIs to consume interface typed property lists.
dotnet/src/VectorData/Common/SqlFilterTranslator.cs Updates base SQL translation column generation and array binding signature to IPropertyModel.
dotnet/src/VectorData/AzureAISearch/AzureAISearchDynamicMapper.cs Updates dynamic mapper switch patterns to interface models.
dotnet/src/VectorData/AzureAISearch/AzureAISearchCollectionCreateMapping.cs Updates field mapping APIs to consume key, data, and vector interfaces.
dotnet/src/VectorData/AzureAISearch/AzureAISearchCollection.cs Updates mapping loops and helpers to use interface models.
dotnet/src/InternalUtilities/connectors/Memory/MongoDB/MongoDynamicMapper.cs Updates internal dynamic mapper to switch on interface models.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roji I can see that all the new interfaces have only a single class that implements them. What is the benefit of introducing these interfaces?

They do come with a cost (more types to JIT and maintain), virtual method calls and casts that are not always obvious. I've been usually going in the opposite direction (example: dotnet/command-line-api#1538) and most of my attempts to introduce public interfaces to BCL were rejected (https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/interface)

{
// The Key column in included in both Vector and Data tables.
Debug.Assert(property is KeyPropertyModel, "property is VectorStoreRecordKeyPropertyModel");
Debug.Assert(property is IKeyPropertyModel, "property is not an IKeyPropertyModel");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is one of the disadvantages of using interfaces: with classes, you could introduce private internal ctor and nobody else would be able to create an instance of a type that inherits from the base class

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I necessarily have anything against a provider having its own subclass of KeyPropertyModel (or even custom implementation of IKeyPropertyModel). In practice, I very much doubt any provider will actually do this, but I'm not sure there's a good reason to prevent it...

// to look that up and replace with the reserved name.
// So we store the policy-transformed name, as StorageName contains the reserved name.
property.TemporaryStorageName = storageName;
((KeyPropertyModel)property).SerializedKeyName = storageName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you hardcast to concrete type rather than interface here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the general explanation for the interfaces - the interfaces are read-only, whereas the concrete implementations are mutable.

@roji
Copy link
Member Author

roji commented Mar 25, 2026

I can see that all the new interfaces have only a single class that implements them. What is the benefit of introducing these interfaces?

So the point here is that when the model is being built, the property model (the things that represent a property in MEVD and all its config) is mutable: this model may evolve as we're building the model. For example, a provider may want to override some configuration aspect on some property based on some database-specific constraint (e.g. change all property storage names to be lowercase because that's what the database requires).

However, once the model is build, finished and finalized, it becomes immutable, including all of the property models it contains; it doesn't make sense for a provider to e.g. mutate some property after the model has already been built. In other words, we have a phase (model building) where we want property models to be mutable, but past that phase we want them to be immutable. Before this PR, we just left the mutable PropertyModels exposed on the model, so a provider could just mutate them.

So the interfaces here serve one purpose - to expose an immutable view over the property models once model building has completed (FWIW we have a very similar thing in EF). An alternative would have been to have completely different types for the mutable and immutable phases, and to copy from the former to the latter at the end of model building; but that seemed like a slightly heavier option, so I opted for an interface-based "view" approach instead. I think that the performance aspects here are negligible - there's indeed more types, but the number is very low and limited (4 to be exact), and virtual dispatch is probably meaningless in the context of a library that does database operations.

But I'm very open to discussing alternatives... The alternative of having both mutable and immutable concrete type hierarchies (instead of interfaces) is maybe OK, and would remove the need for virtual dispatch (but wouldn't reduce the number of types).

@adamsitnik
Copy link
Member

I can see that all the new interfaces have only a single class that implements them. What is the benefit of introducing these interfaces?

So the point here is that when the model is being built, the property model (the things that represent a property in MEVD and all its config) is mutable: this model may evolve as we're building the model. For example, a provider may want to override some configuration aspect on some property based on some database-specific constraint (e.g. change all property storage names to be lowercase because that's what the database requires).

However, once the model is build, finished and finalized, it becomes immutable, including all of the property models it contains; it doesn't make sense for a provider to e.g. mutate some property after the model has already been built. In other words, we have a phase (model building) where we want property models to be mutable, but past that phase we want them to be immutable. Before this PR, we just left the mutable PropertyModels exposed on the model, so a provider could just mutate them.

So the interfaces here serve one purpose - to expose an immutable view over the property models once model building has completed (FWIW we have a very similar thing in EF). An alternative would have been to have completely different types for the mutable and immutable phases, and to copy from the former to the latter at the end of model building; but that seemed like a slightly heavier option, so I opted for an interface-based "view" approach instead. I think that the performance aspects here are negligible - there's indeed more types, but the number is very low and limited (4 to be exact), and virtual dispatch is probably meaningless in the context of a library that does database operations.

But I'm very open to discussing alternatives... The alternative of having both mutable and immutable concrete type hierarchies (instead of interfaces) is maybe OK, and would remove the need for virtual dispatch (but wouldn't reduce the number of types).

I see the point of exposing data as read only, however these APIs will be used by very few users. Is it a problem for existing users or do we think it's may be in the future? Can the users still cast the interface to concrete implementation and mutate it?

Other alternatives:

  • keep things mutable, without introducing any new types
  • implement Freeze pattern (mutable methods are exposed, but they throw after object gets frozen by the builder)
  • implement builders that return 100% immutable types (you already listed it)

<PackageVersion Include="Pinecone.Client" Version="3.1.0" />
<PackageVersion Include="Prompty.Core" Version="0.2.3-beta" />
<PackageVersion Include="Scriban" Version="6.6.0" />
<PackageVersion Include="Scriban" Version="7.0.3" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change: Scriban 6.6.0 has a security vulnerability logged against it.

/cc @westey-m

@roji roji deployed to integration March 25, 2026 17:38 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants