Skip to content

Add automated duplicate for new issue detection and auto-close workflows#5276

Merged
LantaoJin merged 10 commits intoopensearch-project:mainfrom
qianheng-aws:dedup-issue
Mar 31, 2026
Merged

Add automated duplicate for new issue detection and auto-close workflows#5276
LantaoJin merged 10 commits intoopensearch-project:mainfrom
qianheng-aws:dedup-issue

Conversation

@qianheng-aws
Copy link
Copy Markdown
Collaborator

@qianheng-aws qianheng-aws commented Mar 27, 2026

Summary

  • Add GitHub Actions workflows using claude-code-action with AWS Bedrock (OIDC) to automatically detect and close duplicate issues
  • claude-dedupe-issues.yml(LLM-invoked): triggers on new issues, uses Claude to search for potential duplicates
  • auto-close-duplicates.yml(NO-LLM): daily cron job closes flagged issues after a 7-day grace period(configurable by vars.DUPLICATE_GRACE_DAYS) if no objection
  • remove-duplicate-on-activity.yml(NO-LLM): removes duplicate label when a human comments, preventing auto-closure

How is works

Blew snapshots are based on the test in my personal repo

  1. A new create issue will trigger claude-dedupe-issues.yml
    It will invoke Claude Code and use /dedup command(see more details in dedupe.md) to detect duplicated issue:
  • Do nothing if no duplicated issues detected
  • Comprehensive comment and duplicate tag will be added to this new issue otherwise(issue#14 for the blew example)
image
  1. Fight back the dedupe detection results.
  • Anyone leaving any comments on duplicate issues will trigger remove-duplicate-on-activity.yml workflow, and it will remove that tag
image
  • Anyone give thumb-down reaction on that comment will also prevent auto closing.
image
  • Maintainers can choose removing duplicate tag as well.
  1. auto-close-duplicates.yml is a daily cron job closing autoclose issues which is flagged more than 7 days
image

Test plan

  • Created 3 pairs of duplicate issues on fork repo to verify detection
  • Confirmed dedupe workflow triggers on new issue creation
  • Confirmed Claude correctly identifies and comments on duplicate issues
  • Verified duplicate label is applied to detected duplicates
  • Verified leaving comments will trigger removing duplicate label
  • Verified feedback with thumb-down reaction will prevent auto closing
  • Verified the auto close workflow will close the issue with duplicate tag while no thumb-down reaction. Tested by reducing the grace time to 1 hour in my personal repo.

Signed-off-by: Heng Qian qianheng@amazon.com

Implements a 3-workflow system using claude-code-action with Bedrock OIDC:
- claude-dedupe-issues.yml: detects duplicates on new issues via Claude
- auto-close-duplicates.yml: daily cron closes flagged issues after 3 days
- remove-autoclose-on-activity.yml: removes autoclose label on human comment

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@qianheng-aws qianheng-aws added the maintenance Improves code quality, but not the product label Mar 27, 2026
- Detected duplicates now get `duplicate` label instead of `autoclose`
- Auto-close workflow looks for `duplicate` label
- After closing, adds `autoclose` label
- Human comment removes `duplicate` label to prevent auto-closure
- Fix state_reason to `duplicate`
- Change grace period to 1 hour for testing

Signed-off-by: Heng Qian <qianheng@amazon.com>
- Add backfill-duplicate-comments.yml to scan historical issues for duplicates
- Add thumbs-down instruction to duplicate detection comment

Signed-off-by: Heng Qian <qianheng@amazon.com>
songkant-aws
songkant-aws previously approved these changes Mar 27, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 27, 2026

PR Code Analyzer :...

Addressed in commit 9743698

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Signed-off-by: Heng Qian <qianheng@amazon.com>
- Remove unnecessary allowed_non_write_users from dedupe workflow
- Pass workflow inputs via env vars to prevent JS injection in backfill
- Use bash array for REPO_FLAG to prevent word splitting in shell script

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

Backfill workflow dispatches dedupe via API as github-actions[bot],
which requires explicit allowlisting in claude-code-action.

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

…kflow

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

yuancu
yuancu previously approved these changes Mar 27, 2026
Read from repo variable DUPLICATE_GRACE_DAYS (default 7) instead of
hardcoded 3 days for both auto-close workflow and comment script.

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit adc7ddd)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: LLM-based duplicate issue detection workflow

Relevant files:

  • .github/workflows/claude-dedupe-issues.yml
  • .claude/commands/dedupe.md
  • scripts/comment-on-duplicates.sh

Sub-PR theme: Automated duplicate issue lifecycle management (auto-close and label removal)

Relevant files:

  • .github/workflows/auto-close-duplicates.yml
  • .github/workflows/remove-duplicate-on-activity.yml

⚡ Recommended focus areas for review

Incomplete Pagination

When fetching comments for each issue, github.rest.issues.listComments is used with per_page: 100 but without pagination. If an issue has more than 100 comments, the duplicate detection comment may be missed, causing incorrect behavior (e.g., closing an issue that has human comments after the detection comment).

const comments = await github.rest.issues.listComments({
  owner,
  repo,
  issue_number: issue.number,
  per_page: 100,
});
Reaction Check Scope

The thumbs-down reaction check only verifies if the issue author reacted. However, the comment in comment-on-duplicates.sh says "👎 this comment to prevent auto-closure" without restricting it to the author. Any user's thumbs-down reaction should arguably prevent auto-closure, but the current logic only checks the issue author's reaction.

const authorThumbsDown = reactions.data.some(r =>
  r.user.id === issue.user.id && r.content === '-1'
);
Hardcoded Region

The AWS region is hardcoded as us-east-1. This may cause issues if the Bedrock model or the OIDC role is configured in a different region. Consider making this configurable via a repository variable or secret.

No Comment Validation

Any human comment on a duplicate-labeled issue removes the label, including trivial or off-topic comments (e.g., "+1", "me too"). This could allow the duplicate label to be removed unintentionally. Consider requiring a more explicit action or at least logging a warning.

if: |
  github.event.issue.state == 'open' &&
  contains(github.event.issue.labels.*.name, 'duplicate') &&
  github.event.comment.user.type != 'Bot'
Missing REPO Validation

If GITHUB_REPOSITORY is not set and --repo flag is omitted, gh commands will attempt to infer the repository from the local git context. This could lead to unexpected behavior in CI environments. Consider adding an explicit check that REPO is set or that the script is running in a valid context.

REPO="${GITHUB_REPOSITORY:-}"

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

PR Code Suggestions ✨

Latest suggestions up to adc7ddd
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Restrict label removal to authorized users only

The workflow removes the duplicate label when any non-bot user comments, but it
should only trigger when the issue author or a maintainer comments — not any random
user. Otherwise, any external user can comment to remove the duplicate label and
prevent auto-closure. Consider restricting this to the issue author or users with
specific permissions.

.github/workflows/remove-duplicate-on-activity.yml [15]

-github.event.comment.user.type != 'Bot'
+github.event.comment.user.type != 'Bot' &&
+(github.event.comment.user.login == github.event.issue.user.login ||
+ github.event.comment.author_association == 'OWNER' ||
+ github.event.comment.author_association == 'MEMBER' ||
+ github.event.comment.author_association == 'COLLABORATOR')
Suggestion importance[1-10]: 7

__

Why: This is a valid security concern — any user commenting on an issue could remove the duplicate label and prevent auto-closure. Restricting to the issue author or maintainers is a reasonable access control improvement.

Medium
Possible issue
Paginate all comments to avoid missing entries

listComments with per_page: 100 only fetches the first 100 comments. If an issue has
more than 100 comments, the duplicate detection comment or subsequent human comments
may be missed, leading to incorrect auto-closure. Use github.paginate to retrieve
all comments.

.github/workflows/auto-close-duplicates.yml [44-49]

-const comments = await github.rest.issues.listComments({
+const allComments = await github.paginate(github.rest.issues.listComments, {
             owner,
             repo,
             issue_number: issue.number,
             per_page: 100,
           });
+          const comments = { data: allComments };
Suggestion importance[1-10]: 5

__

Why: Issues with more than 100 comments could have the duplicate detection comment missed, but this is an edge case unlikely to occur in practice. The fix is valid but the workaround with comments = { data: allComments } is a bit awkward.

Low
Fix prompt issue number for manual dispatch trigger

When triggered via workflow_dispatch, github.event.issue.number is undefined, so the
expression github.event.issue.number || inputs.issue_number may not behave as
expected in all GitHub Actions expression contexts. It's safer to use a conditional
expression to explicitly handle both trigger types.

.github/workflows/claude-dedupe-issues.yml [42]

-prompt: "/dedupe ${{ github.repository }}/issues/${{ github.event.issue.number || inputs.issue_number }}"
+prompt: "/dedupe ${{ github.repository }}/issues/${{ github.event_name == 'workflow_dispatch' && inputs.issue_number || github.event.issue.number }}"
Suggestion importance[1-10]: 5

__

Why: When github.event.issue.number is undefined in workflow_dispatch, the || operator in GitHub Actions expressions may not fall back correctly. The suggested explicit conditional is safer, though in practice GitHub Actions expressions often handle undefined values gracefully.

Low
Paginate reactions to avoid missing author objection

Similar to comments, reactions are fetched with a hard limit of 100. If there are
more than 100 reactions, the author's thumbs-down reaction could be missed, causing
an issue to be incorrectly auto-closed despite the author's objection. Use
github.paginate here as well.

.github/workflows/auto-close-duplicates.yml [84-89]

-const reactions = await github.rest.reactions.listForIssueComment({
+const allReactions = await github.paginate(github.rest.reactions.listForIssueComment, {
             owner,
             repo,
             comment_id: lastDupeComment.id,
             per_page: 100,
           });
+          const reactions = { data: allReactions };
Suggestion importance[1-10]: 4

__

Why: Having more than 100 reactions on a single comment is extremely unlikely, making this a very low-risk edge case. The suggestion is technically correct but has minimal practical impact.

Low

Previous suggestions

Suggestions up to commit 3583887
CategorySuggestion                                                                                                                                    Impact
General
Restrict label removal to issue author only

The workflow removes the duplicate label when any non-bot user comments, but it
should only trigger when the issue author comments, not any random user. Otherwise,
any user commenting on the issue would remove the duplicate label, which could be
abused or cause unintended behavior.

.github/workflows/remove-duplicate-on-activity.yml [15]

-github.event.comment.user.type != 'Bot'
+github.event.comment.user.type != 'Bot' &&
+github.event.comment.user.login == github.event.issue.user.login
Suggestion importance[1-10]: 7

__

Why: The current logic allows any non-bot commenter to remove the duplicate label, which could be abused. Restricting it to the issue author aligns with the intended behavior described in the comment script (where the author can dispute the duplicate classification).

Medium
Possible issue
Paginate all comments to avoid missing data

The code fetches only the first 100 comments using listComments with per_page: 100,
but issues with more than 100 comments will have their later comments silently
ignored. This could cause the workflow to incorrectly close an issue that has human
comments after the duplicate detection comment. Use github.paginate to fetch all
comments.

.github/workflows/auto-close-duplicates.yml [44-49]

-const comments = await github.rest.issues.listComments({
+const allComments = await github.paginate(github.rest.issues.listComments, {
             owner,
             repo,
             issue_number: issue.number,
             per_page: 100,
           });
+          const comments = { data: allComments };
Suggestion importance[1-10]: 6

__

Why: Issues with more than 100 comments would have later comments silently ignored, potentially causing incorrect auto-closure. Using github.paginate ensures all comments are checked, though in practice most issues won't exceed 100 comments.

Low
Paginate reactions to prevent missed objections

Similar to the comments issue, reactions are fetched with a hard limit of 100 using
listForIssueComment. If there are more than 100 reactions, the author's thumbs-down
reaction could be missed, causing the issue to be incorrectly closed despite the
author's objection. Use github.paginate to retrieve all reactions.

.github/workflows/auto-close-duplicates.yml [84-89]

-const reactions = await github.rest.reactions.listForIssueComment({
+const allReactions = await github.paginate(github.rest.reactions.listForIssueComment, {
             owner,
             repo,
             comment_id: lastDupeComment.id,
             per_page: 100,
           });
+          const reactions = { data: allReactions };
Suggestion importance[1-10]: 5

__

Why: Missing the author's thumbs-down reaction due to the 100-reaction limit could cause incorrect auto-closure. However, having more than 100 reactions on a single comment is extremely unlikely in practice, making this a low-probability edge case.

Low

yuancu
yuancu previously approved these changes Mar 31, 2026
@LantaoJin
Copy link
Copy Markdown
Member

@qianheng-aws which issues will be auto-closed if there are 3 duplicated issues?

@qianheng-aws
Copy link
Copy Markdown
Collaborator Author

@qianheng-aws which issues will be auto-closed if there are 3 duplicated issues?

It didn't require 3 duplicated issues to close a duplicate issues, but just shown top-3 most suspicious ones as reference.

Only consider issues with lower issue numbers as potential originals,
and exclude issues already labeled duplicate from search results.

Signed-off-by: Heng Qian <qianheng@amazon.com>
@qianheng-aws qianheng-aws changed the title Add automated duplicate issue detection and auto-close workflows Add automated duplicate for new issue detection and auto-close workflows Mar 31, 2026
Only run duplicate detection on newly created issues for now.

Signed-off-by: Heng Qian <qianheng@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit adc7ddd

@qianheng-aws
Copy link
Copy Markdown
Collaborator Author

As discussed offline, remove backfill workflow. It should be done offline only one time and generate report for the existing duplicate issues.

@qianheng-aws
Copy link
Copy Markdown
Collaborator Author

CI failure because of Could not find org.opensearch.plugin:geospatial:3.7.0.0-SNAPSHOT.

@LantaoJin LantaoJin merged commit 61e9ecd into opensearch-project:main Mar 31, 2026
11 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Improves code quality, but not the product

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants