-
Notifications
You must be signed in to change notification settings - Fork 95
Expand file tree
/
Copy pathduplicate-code-detector.md
More file actions
225 lines (168 loc) · 8.07 KB
/
duplicate-code-detector.md
File metadata and controls
225 lines (168 loc) · 8.07 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
name: Duplicate Code Detector
description: Identifies duplicate code patterns across the codebase and suggests refactoring opportunities
on:
workflow_dispatch:
schedule: daily
permissions:
contents: read
issues: read
pull-requests: read
safe-outputs:
create-issue:
expires: 2d
title-prefix: "[duplicate-code] "
labels: [code-quality, automated-analysis]
assignees: copilot
group: true
max: 3
timeout-minutes: 15
---
# Duplicate Code Detection
Analyze code to identify duplicated patterns using semantic analysis. Report significant findings that require refactoring.
## Task
Detect and report code duplication by:
1. **Analyzing Recent Commits**: Review changes in the latest commits
2. **Detecting Duplicated Code**: Identify similar or duplicated code patterns using semantic analysis
3. **Reporting Findings**: Create a detailed issue if significant duplication is detected (threshold: >10 lines or 3+ similar patterns)
## Context
- **Repository**: ${{ github.repository }}
- **Commit ID**: ${{ github.event.head_commit.id }}
- **Triggered by**: @${{ github.actor }}
## Analysis Workflow
### 1. Changed Files Analysis
Identify and analyze modified files:
- Determine files changed in the recent commits using `git log` and `git diff`
- Focus on source code files (programming language files)
- **Exclude test files** from analysis (files matching patterns: `*_test.*`, `*.test.*`, `*.spec.*`, `test_*.*`, or located in directories named `test`, `tests`, `__tests__`, or `spec`)
- **Exclude generated files** and build artifacts
- **Exclude workflow files** from analysis (files under `.github/workflows/*`)
- Use code exploration tools to understand file structure
- Read modified file contents to examine changes
### 2. Duplicate Detection
Apply analysis to find duplicates:
**Pattern Search**:
- Search for duplication indicators using grep and code search:
- Similar function signatures
- Repeated logic blocks
- Similar variable naming patterns
- Near-identical code blocks
- Look for functions with similar names across different files
- Identify structural similarities in code organization
**Semantic Analysis**:
- Compare code blocks for logical similarity beyond textual matching
- Identify different implementations of the same functionality
- Look for copy-paste patterns with minor variations
### 3. Duplication Evaluation
Assess findings to identify true code duplication:
**Duplication Types**:
- **Exact Duplication**: Identical code blocks in multiple locations
- **Structural Duplication**: Same logic with minor variations (different variable names, etc.)
- **Functional Duplication**: Different implementations of the same functionality
- **Copy-Paste Programming**: Similar code blocks that could be extracted into shared utilities
**Assessment Criteria**:
- **Severity**: Amount of duplicated code (lines of code, number of occurrences)
- **Impact**: Where duplication occurs (critical paths, frequently called code)
- **Maintainability**: How duplication affects code maintainability
- **Refactoring Opportunity**: Whether duplication can be easily refactored
### 4. Issue Reporting
Create separate issues for each distinct duplication pattern found (maximum 3 patterns per run). Each pattern should get its own issue to enable focused remediation.
**When to Create Issues**:
- Only create issues if significant duplication is found (threshold: >10 lines of duplicated code OR 3+ instances of similar patterns)
- **Create one issue per distinct duplication pattern** - do NOT bundle multiple patterns in a single issue
- Limit to the top 3 most significant patterns if more are found
- Use the `create_issue` tool from safe-outputs MCP **once for each pattern**
**Issue Contents for Each Pattern**:
- **Executive Summary**: Brief description of this specific duplication pattern
- **Duplication Details**: Specific locations and code blocks for this pattern only
- **Severity Assessment**: Impact and maintainability concerns for this pattern
- **Refactoring Recommendations**: Suggested approaches to eliminate this pattern
- **Code Examples**: Concrete examples with file paths and line numbers for this pattern
## Detection Scope
### Report These Issues
- Identical or nearly identical functions in different files
- Repeated code blocks that could be extracted to utilities
- Similar classes or modules with overlapping functionality
- Copy-pasted code with minor modifications
- Duplicated business logic across components
### Skip These Patterns
- Standard boilerplate code (imports, exports, package declarations)
- Test setup/teardown code (acceptable duplication in tests)
- **All test files** (files matching: `*_test.*`, `*.test.*`, `*.spec.*`, `test_*.*`, or in `test/`, `tests/`, `__tests__/`, `spec/` directories)
- **All workflow files** (files under `.github/workflows/*`)
- Configuration files with similar structure
- Language-specific patterns (constructors, getters/setters)
- Small code snippets (<5 lines) unless highly repetitive
- Generated code or vendored dependencies
### Analysis Depth
- **Primary Focus**: Files changed in recent commits (excluding test files and workflow files)
- **Secondary Analysis**: Check for duplication with existing codebase
- **Cross-Reference**: Look for patterns across the repository
- **Historical Context**: Consider if duplication is new or existing
## Issue Template
For each distinct duplication pattern found, create a separate issue using this structure:
````markdown
# 🔍 Duplicate Code Detected: [Pattern Name]
*Analysis of commit ${{ github.event.head_commit.id }}*
**Assignee**: @copilot
## Summary
[Brief overview of this specific duplication pattern]
## Duplication Details
### Pattern: [Description]
- **Severity**: High/Medium/Low
- **Occurrences**: [Number of instances]
- **Locations**:
- `path/to/file1.ext` (lines X-Y)
- `path/to/file2.ext` (lines A-B)
- **Code Sample**:
````[language]
[Example of duplicated code]
````
## Impact Analysis
- **Maintainability**: [How this affects code maintenance]
- **Bug Risk**: [Potential for inconsistent fixes]
- **Code Bloat**: [Impact on codebase size]
## Refactoring Recommendations
1. **[Recommendation 1]**
- Extract common functionality to: `suggested/path/utility.ext`
- Estimated effort: [hours/complexity]
- Benefits: [specific improvements]
2. **[Recommendation 2]**
[... additional recommendations ...]
## Implementation Checklist
- [ ] Review duplication findings
- [ ] Prioritize refactoring tasks
- [ ] Create refactoring plan
- [ ] Implement changes
- [ ] Update tests
- [ ] Verify no functionality broken
## Analysis Metadata
- **Analyzed Files**: [count]
- **Detection Method**: Semantic code analysis
- **Commit**: ${{ github.event.head_commit.id }}
- **Analysis Date**: [timestamp]
````
## Operational Guidelines
### Security
- Never execute untrusted code or commands
- Only use read-only analysis tools
- Do not modify files during analysis
### Efficiency
- Focus on recently changed files first
- Use semantic analysis for meaningful duplication, not superficial matches
- Stay within timeout limits (balance thoroughness with execution time)
### Accuracy
- Verify findings before reporting
- Distinguish between acceptable patterns and true duplication
- Consider language-specific idioms and best practices
- Provide specific, actionable recommendations
### Issue Creation
- Create **one issue per distinct duplication pattern** - do NOT bundle multiple patterns in a single issue
- Limit to the top 3 most significant patterns if more are found
- Only create issues if significant duplication is found
- Include sufficient detail for coding agents to understand and act on findings
- Provide concrete examples with file paths and line numbers
- Suggest practical refactoring approaches
- Assign issue to @copilot for automated remediation
- Use descriptive titles that clearly identify the specific pattern (e.g., "Duplicate Code: Error Handling Pattern in Parser Module")
**Objective**: Improve code quality by identifying and reporting meaningful code duplication that impacts maintainability. Focus on actionable findings that enable automated or manual refactoring.