This document outlines the plan to port the Binary Ninja (BN) Diff tooling from the rust_diff/ repository into the smartdiff architecture via an MCP (Model Context Protocol) layer. The goal is to enable AI agents to compare binary functions using Binary Ninja's analysis capabilities while maintaining architecture compliance with smartdiff's design principles.
Location: /home/matteius/codediff/rust_diff/
Architecture:
- Rust Core (
src/lib.rs): Binary diffing engine with C FFI exports - Python Plugin (
__init__.py): Binary Ninja plugin integration - Binary Ninja Integration: Direct BinaryView API usage for binary analysis
- Output Formats: JSON, CSV, SQLite, HTML reports
- GUI: Optional Qt-based results viewer
Key Features:
-
Multi-phase Function Matching:
- Exact hash matching (CFG + call graph hashes)
- Name-based matching
- Structural similarity matching
- Heuristic matching with parallel processing
-
Binary-Specific Analysis:
- Basic block extraction and analysis
- Instruction-level comparison
- Control Flow Graph (CFG) hashing
- Call graph analysis
- Cyclomatic complexity calculation
-
Similarity Metrics:
- CFG similarity (50% weight)
- Basic block similarity (15% weight)
- Instruction similarity (10% weight)
- Edge similarity (25% weight)
- Name similarity
- Call similarity
-
Confidence Scoring:
- Size-based confidence boost
- Complexity-based confidence boost
- Basic block count boost
- Name match boost
Location: /home/matteius/codediff/crates/mcp-server/
Architecture:
- MCP Server (
src/server.rs): JSON-RPC 2.0 over stdio - Comparison Manager (
src/comparison/): Stateful comparison lifecycle - Tool Handler (
src/tools/): MCP tools for code analysis - Resource Handler (
src/resources/): Structured data access - Source Code Focus: Tree-sitter based AST parsing for source code
Current MCP Tools:
compare_locations- Compare files/directorieslist_changed_functions- List functions by change magnitudeget_function_diff- Get detailed function diffget_comparison_summary- Get comparison overview
Supported Languages: Rust, Python, JavaScript, Java, C/C++ (source code only)
-
Binary Analysis Capabilities:
- Direct binary file parsing (via Binary Ninja)
- Assembly instruction analysis
- Binary-specific hashing (CFG, call graph)
- Basic block level granularity
- Cross-architecture support
-
Binary Ninja Integration:
- BinaryView API access
- BNDB file format support
- Binary Ninja's advanced analysis features
- Decompilation integration potential
-
Binary-Specific Matching Algorithms:
- Hash-based exact matching for binaries
- Structural matching optimized for compiled code
- Instruction mnemonic hashing
-
MCP Protocol Integration:
- Standardized AI agent interface
- JSON-RPC 2.0 communication
- Resource-based data access
- Stateful comparison management
-
Source Code Analysis:
- AST-based comparison
- Tree edit distance algorithms
- Semantic analysis
- Refactoring detection
-
Multi-file/Directory Support:
- Recursive directory comparison
- Cross-file tracking
- File pattern filtering
Following smartdiff's architecture principles:
-
Rust Backend Layer (New:
crates/binary-ninja-bridge/)- Binary Ninja API integration
- Binary function extraction
- Binary-specific feature computation
- Abstraction layer over BinaryView
-
MCP Server Extension (Extend:
crates/mcp-server/)- New MCP tools for binary comparison
- Binary-specific resources
- Unified interface for both source and binary analysis
-
Comparison Engine Integration (Extend:
crates/diff-engine/)- Binary function matching algorithms
- Hybrid source/binary comparison support
- Unified similarity scoring
Purpose: Compare two binary files using Binary Ninja analysis
Input:
{
"binary_a": "/path/to/binary1.bndb",
"binary_b": "/path/to/binary2.bndb",
"options": {
"similarity_threshold": 0.6,
"confidence_threshold": 0.5,
"match_algorithms": ["exact_hash", "name", "structural", "heuristic"],
"include_unmatched": true
}
}Output:
{
"comparison_id": "uuid",
"binary_a_name": "binary1.bndb",
"binary_b_name": "binary2.bndb",
"total_functions_a": 150,
"total_functions_b": 148,
"matched_count": 142,
"similarity_score": 0.87,
"analysis_time": 2.3
}Purpose: List matched functions sorted by similarity/change magnitude
Input:
{
"comparison_id": "uuid",
"sort_by": "similarity_desc",
"filter": {
"min_similarity": 0.5,
"max_similarity": 0.95,
"match_type": ["structural", "heuristic"]
},
"limit": 50,
"offset": 0
}Output:
{
"matches": [
{
"function_a": {
"name": "process_data",
"address": "0x1800",
"size": 300,
"basic_blocks": 6,
"complexity": 8
},
"function_b": {
"name": "process_data_v2",
"address": "0x1850",
"size": 320,
"basic_blocks": 7,
"complexity": 9
},
"similarity": 0.82,
"confidence": 0.89,
"match_type": "structural",
"details": {
"cfg_similarity": 0.85,
"bb_similarity": 0.86,
"instruction_similarity": 0.78,
"edge_similarity": 0.88
}
}
],
"total": 142,
"has_more": true
}Purpose: Get detailed diff for a specific binary function match
Input:
{
"comparison_id": "uuid",
"function_a_address": "0x1800",
"function_b_address": "0x1850",
"include_disassembly": true,
"include_cfg": true,
"include_decompilation": false
}Output:
{
"function_a": { /* detailed function info */ },
"function_b": { /* detailed function info */ },
"diff": {
"basic_blocks_added": 1,
"basic_blocks_removed": 0,
"basic_blocks_modified": 3,
"instructions_added": 12,
"instructions_removed": 5,
"cfg_changes": [ /* CFG edge changes */ ],
"disassembly_diff": "...",
"decompilation_diff": null
},
"similarity_breakdown": { /* detailed metrics */ }
}Purpose: Load a binary file in Binary Ninja for analysis
Input:
{
"binary_path": "/path/to/binary.exe",
"analysis_options": {
"auto_analyze": true,
"load_debug_info": true,
"architecture": "auto"
}
}Output:
{
"binary_id": "uuid",
"file_path": "/path/to/binary.exe",
"architecture": "x86_64",
"platform": "linux",
"function_count": 150,
"analysis_complete": true
}Purpose: List all functions in a loaded binary
Input:
{
"binary_id": "uuid",
"filter": {
"min_size": 10,
"name_pattern": "process_*"
},
"sort_by": "size_desc",
"limit": 100,
"offset": 0
}Output:
{
"functions": [
{
"name": "process_data",
"address": "0x1800",
"size": 300,
"basic_blocks": 6,
"complexity": 8,
"call_count": 2
}
],
"total": 150,
"has_more": true
}Goal: Create abstraction layer over Binary Ninja API
Tasks:
- Create
crates/binary-ninja-bridge/crate - Implement Binary Ninja Rust API bindings
- Create
BinaryLoaderfor loading BNDB files - Create
BinaryFunctionExtractorfor extracting function info - Implement binary-specific feature extraction
- Add comprehensive error handling
- Write unit tests with mock binaries
Deliverables:
crates/binary-ninja-bridge/src/lib.rscrates/binary-ninja-bridge/src/loader.rscrates/binary-ninja-bridge/src/extractor.rscrates/binary-ninja-bridge/src/features.rscrates/binary-ninja-bridge/tests/
Goal: Port binary matching algorithms to smartdiff
Tasks:
- Extend
crates/diff-engine/with binary matching - Port exact hash matching algorithm
- Port structural matching algorithm
- Port heuristic matching with parallelization
- Implement binary-specific similarity scoring
- Add confidence calculation for binary matches
- Write comprehensive tests
Deliverables:
crates/diff-engine/src/binary_matcher.rscrates/diff-engine/src/binary_similarity.rs- Updated
crates/diff-engine/src/engine.rs - Tests for binary matching
Goal: Add MCP tools for binary analysis
Tasks:
- Extend
crates/mcp-server/with binary tools - Implement
compare_binariestool - Implement
list_binary_function_matchestool - Implement
get_binary_function_difftool - Implement
load_binary_in_binjatool - Implement
list_binary_functionstool - Add binary-specific resources
- Update MCP server documentation
Deliverables:
crates/mcp-server/src/tools/binary_tools.rscrates/mcp-server/src/resources/binary_resources.rs- Updated
crates/mcp-server/src/server.rs - Updated
crates/mcp-server/README.md - Updated
crates/mcp-server/MCP_USAGE.md
Goal: End-to-end testing and documentation
Tasks:
- Integration tests with real binaries
- Performance benchmarking
- MCP client testing (Claude Desktop)
- Documentation updates
- Example workflows
- Error handling improvements
Deliverables:
- Integration test suite
- Performance benchmarks
- User documentation
- Example binaries and workflows
Goal: Advanced features and optimizations
Tasks:
- Decompilation diff support
- Cross-architecture comparison
- Incremental analysis caching
- Web UI integration for binary diffs
- Export formats (JSON, CSV, HTML)
- Advanced visualization
New Cargo Dependencies:
[dependencies]
# Binary Ninja API
binaryninja = { git = "https://github.com/Vector35/binaryninja-api", branch = "dev" }
# Existing smartdiff dependencies
smart-diff-parser = { path = "../parser" }
smart-diff-engine = { path = "../diff-engine" }
smart-diff-semantic = { path = "../semantic-analysis" }- Requires Binary Ninja Commercial or Personal license
- MCP server should gracefully handle missing Binary Ninja
- Provide clear error messages for licensing issues
- Document Binary Ninja installation requirements
- Binary Loading: BNDB files can be large, implement lazy loading
- Parallel Processing: Use rayon for parallel function matching
- Caching: Cache extracted features to avoid re-analysis
- Memory Management: Stream large result sets, don't load all in memory
- Binary Ninja Not Available: Graceful degradation
- Invalid Binary Files: Clear error messages
- Analysis Failures: Partial results when possible
- MCP Protocol Errors: Standard JSON-RPC error codes
- ✅ AI agents can load and analyze binary files via MCP
- ✅ Binary function matching achieves similar accuracy to rust_diff
- ✅ MCP tools follow smartdiff architecture patterns
- ✅ Performance is acceptable (< 5s for typical binary comparison)
- ✅ Documentation is comprehensive and clear
- ✅ Integration tests pass with real binaries
- ✅ Works with Claude Desktop and other MCP clients
- Review and approve this plan
- Set up development environment with Binary Ninja
- Create
crates/binary-ninja-bridge/skeleton - Begin Phase 1 implementation
- Regular progress reviews and adjustments