Skip to content

Conversation

@SamMorrowDrums
Copy link
Collaborator

Summary

Integrates the semantic diff engine into the existing tools that return file patches, so models get more token-efficient diffs across all diff-producing tools — not just compare_file_contents.

Tools updated

Tool Method Change
get_commit Adds patch field to each file in the response, with semantic diff applied
pull_request_read get_diff Splits multi-file raw diff by file, applies semantic diff per-file
pull_request_read get_files Applies semantic diff to each file's patch field

How it works

  1. For each file patch, checks the file extension against known formats
  2. Structured data (JSON, YAML, CSV, TOML): Reconstructs base/head from the patch hunks, produces path-based semantic diff
  3. Code files (Go, Python, JS, etc.): Reconstructs base/head, produces structural declaration-level diff with inline detail
  4. Other files: Keeps the original unified diff patch unchanged

Patch reconstruction

The reconstructFromPatch function extracts base and head content from unified diff hunks. This works well because:

  • Structured data files typically have the full content in the diff
  • Code patches include enough context for tree-sitter to parse declarations
  • If reconstruction fails or produces poor results, the original patch is preserved

Testing

  • 10 unit tests for diff integration (patch reconstruction, per-file splitting, semantic application)
  • All existing tests pass unchanged
  • Lint clean

Dependencies

Stacked on #1983 (symbol extraction) → #1982 (tree-sitter) → #1981 (semantic data diffs)


Part of #1973

Integrates the semantic diff engine into existing tools that return
file patches:

- get_commit: Adds Patch field to MinimalCommitFile and applies
  semantic diff for supported formats (JSON, YAML, CSV, TOML) and
  structural diff for code files
- pull_request_read (get_diff): Splits multi-file raw diff by file
  and applies semantic diff per-file where beneficial
- pull_request_read (get_files): Applies semantic diff to each
  file's Patch field before returning

Unsupported file types keep their original unified diff patches
unchanged. Patches are reconstructed from hunks to feed the semantic
diff engine — this works well for structured data where the full
content is typically in the diff.
@SamMorrowDrums SamMorrowDrums requested a review from a team as a code owner February 9, 2026 23:10
Copilot AI review requested due to automatic review settings February 9, 2026 23:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates the existing semantic/structural diff engine into the remaining diff-producing GitHub MCP tools so they return more token-efficient patches (not just compare_file_contents). It post-processes per-file patches and multi-file PR diffs to apply semantic diffs where supported, while falling back to the original unified diff when not.

Changes:

  • pull_request_read.get_diff: post-processes the raw multi-file diff, splitting by file and applying semantic/structural diffs per file when beneficial.
  • pull_request_read.get_files: applies semantic/structural diffs to each file’s patch field when possible.
  • get_commit: adds a patch field to MinimalCommitFile and applies semantic/structural diffing to commit file patches.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
pkg/github/pullrequests.go Applies semantic diff processing to PR raw diffs (get_diff) and file patches (get_files).
pkg/github/minimal_types.go Adds patch to MinimalCommitFile and applies semantic diffing during minimal commit conversion.
pkg/github/diff_integration.go Introduces helpers to reconstruct base/head from hunks, split multi-file diffs, and apply semantic diffs per file.
pkg/github/diff_integration_test.go Adds unit tests for patch reconstruction, diff splitting, and semantic application behavior.
Comments suppressed due to low confidence (1)

pkg/github/pullrequests.go:300

  • This mutates each file's Patch to a semantic/structural diff when possible, but current get_files tests don't assert anything about the patch field or semantic transformation. Add coverage that includes a realistic patch body for a supported format and assert the marshaled response has the transformed patch for that file (and leaves unsupported formats unchanged).
	// Apply semantic diffs to file patches where beneficial
	for _, file := range files {
		if file.Patch != nil && file.Filename != nil {
			semanticPatch := applySemanticDiffToUnifiedPatch(file.GetFilename(), file.GetPatch())
			file.Patch = &semanticPatch

}

semanticPatch := applySemanticDiffToUnifiedPatch(section.filename, section.patch)
if semanticPatch != section.patch {
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a per-file patch is converted to a semantic diff, the output drops the original per-file diff header (e.g., diff --git, index, new file mode, rename from/to, deleted file mode). This loses important metadata (notably renames and mode changes) and can make the result misleading. Preserve the original header information even when emitting a semantic/structural diff (or explicitly include any rename/mode metadata alongside the semantic diff).

Suggested change
if semanticPatch != section.patch {
if semanticPatch != section.patch {
// Preserve the original per-file header (including metadata such as renames/mode changes)
result.WriteString(section.header)
if section.header != "" {
result.WriteString("\n")
}

Copilot uses AI. Check for mistakes.
Comment on lines +223 to +224
// Return the raw response, with semantic diffs applied where beneficial
return utils.NewToolResultText(processMultiFileDiff(string(raw))), nil
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes get_diff output by post-processing the raw multi-file diff, but existing pull request tests only cover a Markdown diff (which is expected to remain unchanged). Add a test case that includes at least one supported semantic/structural file type (e.g., JSON or Go) and assert that the returned diff is transformed accordingly, so regressions in semantic application/splitting are caught.

This issue also appears on line 296 of the same file.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines 239 to +243
for _, file := range commit.Files {
patch := file.GetPatch()
if patch != "" {
patch = applySemanticDiffToUnifiedPatch(file.GetFilename(), patch)
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinimalCommitFile now includes a patch field and convertToMinimalCommit applies semantic/structural diffing to it. There doesn't appear to be test coverage asserting that get_commit returns the new patch field and that semantic diffing is applied (or that include_diff=false still omits diffs). Adding a focused test will help prevent accidental regressions in commit output.

Copilot generated this review using guidance from repository custom instructions.
Import declarations now show package names instead of opaque
_import_declaration_N identifiers. Go imports extract short package
names (e.g., 'fmt, http, context'). For large import blocks, shows
first two and count (e.g., 'fmt, http, ... (5 packages)').

Import blocks are keyed by position rather than name, so changing the
imported packages shows as 'modified' with inline diff instead of
separate remove+add entries.

Added declarations now show their signature (first line) to give
context about what was added, rather than just the word 'added'.
…ed files

Package clauses now extract the package name from the package_identifier
child node, showing 'package handler' instead of '_package_clause_0'.

New and deleted code files now show a declaration summary listing all
top-level symbols with their signatures, rather than just 'file added'
or 'file deleted'. This gives the model a table-of-contents view of
what was added/removed.
Multi-var/const blocks now extract all declared names from var_spec_list
children (e.g., 'globalMap, oldRegexp' instead of '_var_declaration_19').
For >3 names, shows first two plus count.

Var/const blocks are keyed by position (like imports) so adding/removing
a variable shows as 'modified' with inline diff rather than remove+add.

When name extraction returns empty, falls back to showing the first line
of the declaration text instead of opaque '_kind_N' indices. This ensures
no tree-sitter internal names appear in output.
When searching for a symbol like 'RegisterRoutes', if no exact match
is found, try suffix matching against receiver-qualified names like
'(*Handler).RegisterRoutes'. Returns the match only when unambiguous
(exactly one suffix match).

Error messages now suggest the closest match when a suffix match exists
but wasn't used (e.g., when there are multiple types with the same
method name).

Updated the symbol parameter description to document the receiver
prefix format and note that bare names work when unambiguous.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant