Refactor CollectionModel/builder and introduce read-only property interfaces#13699
Refactor CollectionModel/builder and introduce read-only property interfaces#13699roji wants to merge 5 commits intomicrosoft:mainfrom
Conversation
a49d45d to
9f5b23c
Compare
9f5b23c to
3ed77b9
Compare
3ed77b9 to
abcdb5a
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
abcdb5a to
9bfbb38
Compare
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 86%
✓ Correctness
This is a well-structured refactoring that introduces read-only interfaces (IPropertyModel, IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel) over the concrete property model classes, with all connectors updated to consume the interfaces. It also replaces the 'TemporaryStorageName' workaround with a proper 'SerializedKeyName' on IKeyPropertyModel, switches from virtual method-based property accessors to delegate-based ones configured during model building, and extracts duplicated embedding configuration logic into ConfigureVectorPropertyEmbedding. The changes are mechanically consistent, type-safe (leveraging IReadOnlyList covariance), and preserve existing behavior. No blocking correctness issues were found.
✗ Security Reliability
This PR introduces read-only interfaces (IPropertyModel, IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel) over the existing concrete property model classes and refactors accessors from virtual methods with inline null-guarding into delegate-based accessors configured during model building. The main reliability concern is that the new GetValueAsObject/SetValueAsObject methods guard the _getter/_setter delegates with Debug.Assert (stripped in Release builds), meaning a misconfigured property will produce an opaque NullReferenceException in production instead of a descriptive error. Additionally, the Activator.CreateInstance(recordType) call uses null-suppression without an explicit null check, though this is low-risk since model building validates the parameterless constructor exists. The rest of the changes are mechanical type signature updates from concrete to interface types, which look correct and pose no security or reliability risks.
✓ Test Coverage
This PR introduces read-only interfaces (IPropertyModel, IKeyPropertyModel, IDataPropertyModel, IVectorPropertyModel) for property models, refactors property accessors from inline virtual methods to configured delegates (ConfigurePocoAccessors/ConfigureDynamicAccessors), renames TemporaryStorageName to SerializedKeyName on KeyPropertyModel, replaces IRecordCreator with a simple Func, and extracts ConfigureVectorPropertyEmbedding. Existing tests cover the embedding generation refactoring and round-trip mapper behavior well. However, there are no direct unit tests for the new accessor delegate configuration pattern, which is a meaningful behavioral change: if accessors fail to be configured during model building, GetValueAsObject/SetValueAsObject will silently fail via Debug.Assert (no-op in release). The conformance and integration tests provide indirect coverage, but given the centrality of property access to the entire VectorData stack, targeted unit tests for the new accessor setup would reduce regression risk.
✗ Design Approach
The PR introduces
IPropertyModel/IKeyPropertyModel/IDataPropertyModel/IVectorPropertyModelinterfaces as read-only consumer-facing views of the concrete property model classes, updating all connectors to depend on interfaces rather than concrete types. The direction is sound and correctly separates the builder-facing mutable API from the connector-facing read API. Two design issues stand out: (1)IPropertyModel.ProviderAnnotationsis typed as the mutableDictionary<string, object?>?on an interface described as a 'read-only view', which lets any code holding only anIPropertyModelreference corrupt model state by modifying annotations; it should beIReadOnlyDictionary<string, object?>?. (2) TheDebug.Assertmessage inSqlitePropertyMapping.cswas changed to"property is VectorStoreRecordIKeyPropertyModel"which is not a real type name and is misleading. Everything else — the interface surface, theFunc<object>factory replacement, theSerializedKeyNamerename, and theConfigurePocoAccessors/ConfigureDynamicAccessorssplit — is well-structured.Flagged Issues
- In PropertyModel.GetValueAsObject and SetValueAsObject, Debug.Assert is used to check that _getter/_setter are non-null, but Debug.Assert is compiled out in Release builds. If accessors are not configured (e.g. due to a bug in a custom model builder subclass or future refactoring), production code will throw an unhelpful NullReferenceException instead of a descriptive error. Replace with a runtime guard (e.g. throw InvalidOperationException).
- IPropertyModel.ProviderAnnotations is declared as mutable Dictionary<string, object?>? on an interface described as a 'read-only view'. Any connector holding only an IPropertyModel reference can freely mutate the annotations dictionary, corrupting shared model state. This should be IReadOnlyDictionary<string, object?>? on the interface; the concrete PropertyModel class can keep the mutable Dictionary for builder/extension-method use.
Suggestions
- Add unit tests for ConfigurePocoAccessors and ConfigureDynamicAccessors verifying that GetValueAsObject and SetValueAsObject work correctly after Build() configures the delegates, including the InvalidCastException path for type mismatches. This is the most significant behavioral refactoring in the PR and currently has no direct test coverage.
- The test class rename from PropertyModelTests to IPropertyModelTests is a naming oddity since the tests still exercise concrete DataPropertyModel/KeyPropertyModel types. Consider keeping the original name or expanding the tests to also cover accessor methods (GetValueAsObject/SetValueAsObject) with both POCO and dynamic accessors configured.
- The Debug.Assert message in SqlitePropertyMapping.cs was changed to "property is VectorStoreRecordIKeyPropertyModel" which is not a real type name. Use the actual interface name IKeyPropertyModel or a plain description like "Expected a key property".
- In CollectionModelBuilder.cs, the extracted ConfigureVectorPropertyEmbedding now uses vectorProperty.Type instead of definitionVectorProperty.Type, which is an improvement for CLR-backed properties. Consider adding a unit test covering the scenario where a VectorStoreCollectionDefinition omits Type for a CLR-backed property to confirm this fix.
- The factory lambda () => Activator.CreateInstance(recordType)! uses null-suppression. While the parameterless constructor is validated before this point, a defensive null check would be more robust.
- Consider adding a test in CollectionModelBuilderTests that asserts SerializedKeyName is populated correctly after building a model with a reserved key storage name, since the TemporaryStorageName → SerializedKeyName rename is otherwise only tested indirectly.
Automated review by roji's agents
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/PropertyModel.cs
Show resolved
Hide resolved
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IPropertyModel.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR refactors the VectorData provider model surface to expose immutable, read only property model interfaces after build, while simplifying model construction and deduplicating embedding resolution logic across builder paths.
Changes:
- Introduces
IPropertyModelplus specialized read only interfaces (IKeyPropertyModel,IDataPropertyModel,IVectorPropertyModel) and updates connectors and tests to consume them. - Refactors
PropertyModelaccessors to use configured delegates and updatesCollectionModelrecord creation to use a factory delegate instead ofIRecordCreator. - Extracts
ConfigureVectorPropertyEmbedding()to centralize embedding generator and embedding type resolution during model building.
Reviewed changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| dotnet/test/VectorData/VectorData.UnitTests/PropertyModelTests.cs | Renames test class to reflect the new interface based surface. |
| dotnet/test/VectorData/SqliteVec.UnitTests/SqlitePropertyMappingTests.cs | Updates tests to use IPropertyModel collections. |
| dotnet/test/VectorData/Redis.UnitTests/RedisCollectionCreateMappingTests.cs | Updates tests to use IPropertyModel[]. |
| dotnet/test/VectorData/PgVector.UnitTests/PostgresPropertyMappingTests.cs | Updates tests to use List<IPropertyModel>. |
| dotnet/src/VectorData/Weaviate/WeaviateQueryBuilder.cs | Switches helper signatures to IVectorPropertyModel and IDataPropertyModel. |
| dotnet/src/VectorData/Weaviate/WeaviateMapper.cs | Switches vector population helper to IVectorPropertyModel. |
| dotnet/src/VectorData/Weaviate/WeaviateCollection.cs | Switches search vector generation signature to IVectorPropertyModel. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/VectorPropertyModel.cs | Implements IVectorPropertyModel and clarifies EmbeddingType contract. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/PropertyModel.cs | Implements IPropertyModel and moves to delegate based accessors. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/KeyPropertyModel.cs | Implements IKeyPropertyModel and adds SerializedKeyName. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IVectorPropertyModel.cs | Adds read only interface for vector property models. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IRecordCreator.cs | Removes the record creator abstraction. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IPropertyModel.cs | Adds read only interface for common property model behavior. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IKeyPropertyModel.cs | Adds read only interface for key property models. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/IDataPropertyModel.cs | Adds read only interface for data property models. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/Filter/FilterTranslatorBase.cs | Updates binding API to output IPropertyModel. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/DataPropertyModel.cs | Implements IDataPropertyModel. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionModelBuilder.cs | Deduplicates embedding configuration and switches to factory delegate creation. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionModel.cs | Exposes interface typed property lists and uses a record factory. |
| dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionJsonModelBuilder.cs | Uses SerializedKeyName for reserved key serializer name remapping. |
| dotnet/src/VectorData/SqliteVec/SqlitePropertyMapping.cs | Updates mapping logic to consume interface based property models. |
| dotnet/src/VectorData/SqliteVec/SqliteFilterTranslator.cs | Updates filter translator signature to IPropertyModel. |
| dotnet/src/VectorData/SqliteVec/SqliteCommandBuilder.cs | Updates command builder to use IVectorPropertyModel dictionary keys and interface lists. |
| dotnet/src/VectorData/SqliteVec/SqliteCollection.cs | Updates embedding generation bookkeeping to interface keyed dictionaries. |
| dotnet/src/VectorData/SqlServer/SqlServerMapper.cs | Updates mapper helper to take IPropertyModel. |
| dotnet/src/VectorData/SqlServer/SqlServerFilterTranslator.cs | Updates translator APIs to take IPropertyModel. |
| dotnet/src/VectorData/SqlServer/SqlServerCommandBuilder.cs | Updates command builder signatures and pattern matching to interface models. |
| dotnet/src/VectorData/SqlServer/SqlServerCollection.cs | Updates embedding generation bookkeeping to interface keyed dictionaries. |
| dotnet/src/VectorData/Redis/RedisJsonMapper.cs | Updates property concatenation to use IPropertyModel. |
| dotnet/src/VectorData/Redis/RedisCollectionSearchMapping.cs | Updates query build APIs to use IVectorPropertyModel. |
| dotnet/src/VectorData/Redis/RedisCollectionCreateMapping.cs | Updates schema mapping to consume IPropertyModel and derived interfaces. |
| dotnet/src/VectorData/Qdrant/QdrantMapper.cs | Updates mapping helpers to use IPropertyModel and IVectorPropertyModel. |
| dotnet/src/VectorData/Qdrant/QdrantCollectionCreateMapping.cs | Updates mapping APIs to consume IVectorPropertyModel. |
| dotnet/src/VectorData/Qdrant/QdrantCollection.cs | Updates search vector conversion signature to IVectorPropertyModel. |
| dotnet/src/VectorData/Pinecone/PineconeFilterTranslator.cs | Updates translation helper to take IPropertyModel. |
| dotnet/src/VectorData/Pinecone/PineconeCollection.cs | Updates distance function mapping to consume IVectorPropertyModel. |
| dotnet/src/VectorData/PgVector/PostgresSqlBuilder.cs | Updates SQL builder to use interface models and interface keyed embeddings map. |
| dotnet/src/VectorData/PgVector/PostgresPropertyMapping.cs | Updates type mapping and index info to consume interface models. |
| dotnet/src/VectorData/PgVector/PostgresPropertyExtensions.cs | Updates extensions to target interface models. |
| dotnet/src/VectorData/PgVector/PostgresMapper.cs | Updates property population helper to take IPropertyModel. |
| dotnet/src/VectorData/PgVector/PostgresFilterTranslator.cs | Updates translator signature to take IPropertyModel. |
| dotnet/src/VectorData/PgVector/PostgresCollection.cs | Updates embedding generation bookkeeping and search conversion to interface models. |
| dotnet/src/VectorData/MongoDB/MongoFilterTranslator.cs | Updates translation helper to take IPropertyModel. |
| dotnet/src/VectorData/MongoDB/MongoCollectionCreateMapping.cs | Updates index mapping APIs to consume interface typed property lists. |
| dotnet/src/VectorData/MongoDB/MongoCollection.cs | Updates search vector conversion signature to IVectorPropertyModel. |
| dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlMapper.cs | Updates key rename behavior to use SerializedKeyName. |
| dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlFilterTranslator.cs | Updates property access generation to take IPropertyModel. |
| dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlDynamicMapper.cs | Updates dynamic mapping switch patterns to interface models. |
| dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlCollectionQueryBuilder.cs | Updates projection building to filter on IVectorPropertyModel. |
| dotnet/src/VectorData/CosmosNoSql/CosmosNoSqlCollection.cs | Updates partition key property storage to IPropertyModel list. |
| dotnet/src/VectorData/CosmosMongoDB/CosmosMongoFilterTranslator.cs | Updates translation helper to take IPropertyModel. |
| dotnet/src/VectorData/CosmosMongoDB/CosmosMongoCollectionCreateMapping.cs | Updates index mapping APIs to consume interface typed property lists. |
| dotnet/src/VectorData/Common/SqlFilterTranslator.cs | Updates base SQL translation column generation and array binding signature to IPropertyModel. |
| dotnet/src/VectorData/AzureAISearch/AzureAISearchDynamicMapper.cs | Updates dynamic mapper switch patterns to interface models. |
| dotnet/src/VectorData/AzureAISearch/AzureAISearchCollectionCreateMapping.cs | Updates field mapping APIs to consume key, data, and vector interfaces. |
| dotnet/src/VectorData/AzureAISearch/AzureAISearchCollection.cs | Updates mapping loops and helpers to use interface models. |
| dotnet/src/InternalUtilities/connectors/Memory/MongoDB/MongoDynamicMapper.cs | Updates internal dynamic mapper to switch on interface models. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/PropertyModel.cs
Show resolved
Hide resolved
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/PropertyModel.cs
Show resolved
Hide resolved
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionModel.cs
Show resolved
Hide resolved
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/KeyPropertyModel.cs
Outdated
Show resolved
Hide resolved
dotnet/test/VectorData/VectorData.UnitTests/PropertyModelTests.cs
Outdated
Show resolved
Hide resolved
adamsitnik
left a comment
There was a problem hiding this comment.
@roji I can see that all the new interfaces have only a single class that implements them. What is the benefit of introducing these interfaces?
They do come with a cost (more types to JIT and maintain), virtual method calls and casts that are not always obvious. I've been usually going in the opposite direction (example: dotnet/command-line-api#1538) and most of my attempts to introduce public interfaces to BCL were rejected (https://learn.microsoft.com/en-us/dotnet/standard/design-guidelines/interface)
| { | ||
| // The Key column in included in both Vector and Data tables. | ||
| Debug.Assert(property is KeyPropertyModel, "property is VectorStoreRecordKeyPropertyModel"); | ||
| Debug.Assert(property is IKeyPropertyModel, "property is not an IKeyPropertyModel"); |
There was a problem hiding this comment.
this is one of the disadvantages of using interfaces: with classes, you could introduce private internal ctor and nobody else would be able to create an instance of a type that inherits from the base class
There was a problem hiding this comment.
I don't think I necessarily have anything against a provider having its own subclass of KeyPropertyModel (or even custom implementation of IKeyPropertyModel). In practice, I very much doubt any provider will actually do this, but I'm not sure there's a good reason to prevent it...
| // to look that up and replace with the reserved name. | ||
| // So we store the policy-transformed name, as StorageName contains the reserved name. | ||
| property.TemporaryStorageName = storageName; | ||
| ((KeyPropertyModel)property).SerializedKeyName = storageName; |
There was a problem hiding this comment.
Why do you hardcast to concrete type rather than interface here?
There was a problem hiding this comment.
See the general explanation for the interfaces - the interfaces are read-only, whereas the concrete implementations are mutable.
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionModel.cs
Show resolved
Hide resolved
dotnet/src/VectorData/VectorData.Abstractions/ProviderServices/CollectionModelBuilder.cs
Show resolved
Hide resolved
So the point here is that when the model is being built, the property model (the things that represent a property in MEVD and all its config) is mutable: this model may evolve as we're building the model. For example, a provider may want to override some configuration aspect on some property based on some database-specific constraint (e.g. change all property storage names to be lowercase because that's what the database requires). However, once the model is build, finished and finalized, it becomes immutable, including all of the property models it contains; it doesn't make sense for a provider to e.g. mutate some property after the model has already been built. In other words, we have a phase (model building) where we want property models to be mutable, but past that phase we want them to be immutable. Before this PR, we just left the mutable PropertyModels exposed on the model, so a provider could just mutate them. So the interfaces here serve one purpose - to expose an immutable view over the property models once model building has completed (FWIW we have a very similar thing in EF). An alternative would have been to have completely different types for the mutable and immutable phases, and to copy from the former to the latter at the end of model building; but that seemed like a slightly heavier option, so I opted for an interface-based "view" approach instead. I think that the performance aspects here are negligible - there's indeed more types, but the number is very low and limited (4 to be exact), and virtual dispatch is probably meaningless in the context of a library that does database operations. But I'm very open to discussing alternatives... The alternative of having both mutable and immutable concrete type hierarchies (instead of interfaces) is maybe OK, and would remove the need for virtual dispatch (but wouldn't reduce the number of types). |
I see the point of exposing data as read only, however these APIs will be used by very few users. Is it a problem for existing users or do we think it's may be in the future? Can the users still cast the interface to concrete implementation and mutate it? Other alternatives:
|
| <PackageVersion Include="Pinecone.Client" Version="3.1.0" /> | ||
| <PackageVersion Include="Prompty.Core" Version="0.2.3-beta" /> | ||
| <PackageVersion Include="Scriban" Version="6.6.0" /> | ||
| <PackageVersion Include="Scriban" Version="7.0.3" /> |
There was a problem hiding this comment.
Unrelated change: Scriban 6.6.0 has a security vulnerability logged against it.
/cc @westey-m
Changes
ConfigureVectorPropertyEmbedding(): Deduplicated embedding resolution logic that was repeated acrossProcessTypePropertiesandProcessRecordDefinition.IRecordCreatorwithFunc<object>: RemovedIRecordCreatorinterface and two implementing classes (ActivatorBasedRecordCreator,DynamicRecordCreator), replacing them with simple lambdas.TemporaryStorageName→SerializedKeyName: Moved fromPropertyModelbase toKeyPropertyModelwhere it belongs (only used by CosmosNoSql for key property JSON serializer name remapping).GetValueAsObject/SetValueAsObjectoverrides with delegate fields (_getter/_setter), configured viaConfigurePocoAccessors()/ConfigureDynamicAccessors(). Converted null-coalescing throws toDebug.Assert.VectorPropertyModel.EmbeddingTypeexplaining the[AllowNull]invariant.IPropertyModel,IKeyPropertyModel,IDataPropertyModel,IVectorPropertyModel):CollectionModelnow exposesIReadOnlyList<IVectorPropertyModel>etc., giving providers an immutable view post-build. All provider code updated to consume interface types.Note that the last is a non-trivial breaking change for providers (not users), which I think is fine (note that the provider-facing APIs are flagged as [Experimental]).