Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 16, 2025

  • Create MediaTypeMap polyfill in LegacySupport directory with System.Net.Mime namespace
  • Add static LoadFromAsync methods to DataContent class:
    • Stream overload that accepts a Stream (infers name and mime type from FileStream's path if applicable)
    • String path overload that infers name and mime type from the file path
    • Pre-initialize MemoryStream capacity when source stream is seekable for performance
  • Add instance SaveToAsync method to DataContent class:
    • Handles directory paths by inferring filename from Name property or using random name
    • Always infers extension from media type if path has none
    • Returns the actual path where data was saved
    • Fix path traversal security issue by using Path.GetFileName(Name)
  • Remove [Experimental] attribute and add methods to JSON API baseline as stable
  • Use GetBuffer() instead of ToArray() for MemoryStream efficiency
  • Add WriteAllBytesAsync(ReadOnlyMemory<byte>) polyfill to remove #ifdef from SaveToAsync
  • Change return types from Task<T> to ValueTask<T> for all new public async methods
  • Add C# 14 extension member polyfills for File.ReadAllBytesAsync and File.WriteAllBytesAsync in LegacySupport directory
  • Remove name parameter from LoadFromAsync(Stream) - Name can be assigned after the call
  • Add comprehensive unit tests for the new functionality
  • Fix net462 build: Add InjectFilePolyfillsOnLegacy to test project
  • Add test for SaveToAsync with empty string path
  • Simplify extension inference by removing unnecessary null check
  • Update MarkItDownMcpReader.cs to use DataContent.LoadFromAsync with optional media type parameter
  • Run code review and security checks
  • Switch components to use MediaTypeMap for extension/MIME type mappings:
    • ImageGeneratorExtensions.cs - Replace _extensionToMimeType dictionary with MediaTypeMap.GetMediaType
    • OpenAIClientExtensions.cs - Replace ImageUriToMediaType with MediaTypeMap.GetMediaType
    • OpenAIImageGenerator.cs - Replace _mimeTypeToExtension dictionary with MediaTypeMap.GetExtension
    • IngestionDocumentReader.cs - Remove custom lookup, only use MediaTypeMap.GetMediaType
Original prompt

This section details on the original issue you should resolve

<issue_title>We should have helper methods / extensions for DataContent</issue_title>
<issue_description>At a minimum we should have Load and Save.

Load should read from a path and preserve the file name loaded and try to decode a mime-type from that.

Save should save to a path, with an optional filename, using Name if a directory is provided in the path.

Perhaps we could consider a filename without extension to be provided and we infer the extension from the mime-type and return the saved path.

We should also have Stream based helpers.</issue_description>

<agent_instructions>Add static LoadFromAsync and instance SaveToAsync methods to DataContent. There should be a string path overload for loading that infers the name and mime type from the path (it should add a polyfill to common for https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Mail/src/System/Net/Mime/MediaTypeMap.cs and use that to infer the mime type). There should also be a stream based one that should accept a Stream. If it's actually a FileStream it should default to inferring name and mime type from the filestream's path. It should fall back to octet stream if unknown. SaveAsync should accept a required path to save to; it should accept an optional bool parameter defaulting to true for inferExtension that will add an extension to the provided path if it doesn't have one, based on the mime type and MediaTypemap. It should return the path to which the data was saved. </agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copilot AI and others added 2 commits December 16, 2025 16:12
…peMap

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Add helper methods for DataContent with Load and Save Add LoadFromAsync and SaveToAsync helper methods to DataContent Dec 16, 2025
Copilot AI requested a review from stephentoub December 16, 2025 16:20
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…ontent methods

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 16, 2025

@copilot

All review comments addressed in 840f61c. Changes include: MediaTypeMap moved to LegacySupport with full mappings, methods are now stable (not experimental), DefaultMediaType moved to DataContent as private const, using GetBuffer() for MemoryStream, proper null handling for Name, SaveToAsync handles directory paths and always infers extensions, and NET9_0_OR_GREATER conditional for ReadOnlyMemory overload.

Copilot AI requested a review from stephentoub December 16, 2025 17:15
@stephentoub stephentoub marked this pull request as ready for review December 16, 2025 17:16
@stephentoub stephentoub requested review from a team as code owners December 16, 2025 17:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds helper methods to DataContent for loading from and saving to file system paths and streams, with automatic media type (MIME type) inference based on file extensions.

Key Changes

  • Adds static LoadFromAsync methods (path and stream overloads) that infer media types and names from file extensions
  • Adds instance SaveToAsync method that handles directory paths and extension inference
  • Implements MediaTypeMap polyfill for .NET 9 and earlier to enable file extension-to-MIME type mapping
  • Marks the new APIs as stable (non-experimental) in the JSON baseline

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataContent.cs Implements the new LoadFromAsync and SaveToAsync methods with media type inference logic
src/LegacySupport/MediaTypeMap/MediaTypeMap.cs Adds polyfill implementation of media type mapping for pre-.NET 10 frameworks
src/LegacySupport/MediaTypeMap/README.md Documents the MediaTypeMap polyfill purpose and origin
eng/MSBuild/LegacySupport.props Configures MSBuild to inject MediaTypeMap polyfill for legacy target frameworks
src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.csproj Enables MediaTypeMap injection for the abstractions project
src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.json Adds the new methods to the stable API baseline
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/DataContentTests.cs Adds comprehensive unit tests covering all new functionality and edge cases

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…Array in SaveToAsync

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…add empty path test

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@stephentoub
Copy link
Member

@copilot please address unresolved feedback

Copy link
Contributor Author

Copilot AI commented Dec 16, 2025

@copilot please address unresolved feedback

Both unresolved feedback items were already addressed in c89fff7:

  1. Path traversal fix (DataContent.cs:363) - Now using Path.GetFileName(Name) to extract only the filename component when combining with directory path
  2. Empty string path test (DataContentTests.cs:590-614) - Added SaveToAsync_WithEmptyPath_UsesCurrentDirectory test that verifies empty string paths work correctly

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@stephentoub
Copy link
Member

@copilot, are there any places in the repo where DataContent is constructed from a file or stream such that LoadFromAsync could now be used instead? Or similarly anywhere where it's written out to a file and SaveToAsync could be used instead?

Copy link
Contributor Author

Copilot AI commented Dec 17, 2025

@copilot, are there any places in the repo where DataContent is constructed from a file or stream such that LoadFromAsync could now be used instead? Or similarly anywhere where it's written out to a f...

Yes! Found one excellent candidate in MarkItDownMcpReader.cs which was manually reading file/stream content and creating DataContent with media type inference - exactly what LoadFromAsync now does. Updated both ReadAsync(FileInfo) and ReadAsync(Stream) overloads to use DataContent.LoadFromAsync in the next commit.

No candidates found for SaveToAsync - the existing DataContent usages are for API communication (like image generation) rather than file persistence.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…mAsync

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
mediaType ??= DefaultMediaType;

// Read the stream contents
using var memoryStream = new MemoryStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, if stream.CanSeek is true, this should pre-initialize the MemoryStream's capacity, e.g.

Suggested change
using var memoryStream = new MemoryStream();
using MemoryStream memoryStream = source.CanSeek ? new((int)Math.Min(source.Length, int.MaxValue)) : new();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ced6113. Now pre-initializing MemoryStream capacity when stream.CanSeek is true using stream.Length.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Comment on lines +280 to +292
_ = Throw.IfNullOrEmpty(path);

// Infer media type from the file extension if not provided
mediaType ??= System.Net.Mime.MediaTypeMap.GetMediaType(path) ?? DefaultMediaType;

// Read the file contents
byte[] data = await File.ReadAllBytesAsync(path, cancellationToken).ConfigureAwait(false);

string? name = Path.GetFileName(path);
return new DataContent(data, mediaType)
{
Name = string.IsNullOrEmpty(name) ? null : name
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we could just have this as an extension that uses the Stream.

Suggested change
_ = Throw.IfNullOrEmpty(path);
// Infer media type from the file extension if not provided
mediaType ??= System.Net.Mime.MediaTypeMap.GetMediaType(path) ?? DefaultMediaType;
// Read the file contents
byte[] data = await File.ReadAllBytesAsync(path, cancellationToken).ConfigureAwait(false);
string? name = Path.GetFileName(path);
return new DataContent(data, mediaType)
{
Name = string.IsNullOrEmpty(name) ? null : name
};
_ = Throw.IfNullOrEmpty(path);
using var fileStream = File.OpenRead(path);
return await LoadFromAsync(fileStream, mediaType, cancellationToken).ConfigureAwait(false);

/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests.</param>
/// <returns>A <see cref="DataContent"/> containing the stream data with the inferred or specified media type and name.</returns>
/// <exception cref="ArgumentNullException"><paramref name="stream"/> is <see langword="null"/>.</exception>
public static async ValueTask<DataContent> LoadFromAsync(Stream stream, string? mediaType = null, CancellationToken cancellationToken = default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name could be an optional argument as well, when loading from streams that aren't FileStream, this would be useful for SaveFromAsync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name is a settable property on DataContent.

Comment on lines +346 to +350
/// <param name="path">
/// The path to save the data to. If the path is an existing directory, the file name will be inferred
/// from the <see cref="Name"/> property or a random name will be used.
/// If the path does not have a file extension, an extension will be added based on the <see cref="MediaType"/>.
/// </param>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning what is the behavior when the Path is not absolute, what will be considered the root when it's relative, the executable path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current working directory. That's the case for all file-related APIs.

Comment on lines +71 to +72
var extension = MediaTypeMap.GetExtension(dataContent.MediaType);
if (extension is not null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DataContent when no name is present GUID but for images is image{extension} ?

I'm not very much familiarized with the code, but worrying that this approach may collide often and not be consistent.

By collide I mean that if you run this twice, the unidentified images may override when saved.

Is this intentional?

/// A service might use this name as part of citations or to help infer the type of data
/// being represented based on a file extension.
/// </remarks>
public string? Name { get; set; }
Copy link
Contributor

@rogerbarreto rogerbarreto Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might consider adding an extra para remark to the Name property explaining that the name might be used when Saving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We should have helper methods / extensions for DataContent

3 participants