diff --git a/docs/README.hooks.md b/docs/README.hooks.md index bcb1a7c7b..e23200323 100644 --- a/docs/README.hooks.md +++ b/docs/README.hooks.md @@ -32,5 +32,6 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-hooks) for guidelines on how to | Name | Description | Events | Bundled Assets | | ---- | ----------- | ------ | -------------- | | [Governance Audit](../hooks/governance-audit/README.md) | Scans Copilot agent prompts for threat signals and logs governance events | sessionStart, sessionEnd, userPromptSubmitted | `audit-prompt.sh`
`audit-session-end.sh`
`audit-session-start.sh`
`hooks.json` | +| [Secrets Scanner](../hooks/secrets-scanner/README.md) | Scans files modified during a Copilot coding agent session for leaked secrets, credentials, and sensitive data | sessionEnd | `hooks.json`
`scan-secrets.sh` | | [Session Auto-Commit](../hooks/session-auto-commit/README.md) | Automatically commits and pushes changes when a Copilot coding agent session ends | sessionEnd | `auto-commit.sh`
`hooks.json` | | [Session Logger](../hooks/session-logger/README.md) | Logs all Copilot coding agent session activity for audit and analysis | sessionStart, sessionEnd, userPromptSubmitted | `hooks.json`
`log-prompt.sh`
`log-session-end.sh`
`log-session-start.sh` | diff --git a/hooks/secrets-scanner/README.md b/hooks/secrets-scanner/README.md new file mode 100644 index 000000000..cd5e21e09 --- /dev/null +++ b/hooks/secrets-scanner/README.md @@ -0,0 +1,202 @@ +--- +name: 'Secrets Scanner' +description: 'Scans files modified during a Copilot coding agent session for leaked secrets, credentials, and sensitive data' +tags: ['security', 'secrets', 'scanning', 'session-end'] +--- + +# Secrets Scanner Hook + +Scans files modified during a GitHub Copilot coding agent session for accidentally leaked secrets, credentials, API keys, and other sensitive data before they are committed. + +## Overview + +AI coding agents generate and modify code rapidly, which increases the risk of hardcoded secrets slipping into the codebase. This hook acts as a safety net by scanning all modified files at session end for 20+ categories of secret patterns, including: + +- **Cloud credentials**: AWS access keys, GCP service account keys, Azure client secrets +- **Platform tokens**: GitHub PATs, npm tokens, Slack tokens, Stripe keys +- **Private keys**: RSA, EC, OpenSSH, PGP, DSA private key blocks +- **Connection strings**: Database URIs (PostgreSQL, MongoDB, MySQL, Redis, MSSQL) +- **Generic secrets**: API keys, passwords, bearer tokens, JWTs +- **Internal infrastructure**: Private IP addresses with ports + +## Features + +- **Two scan modes**: `warn` (log only) or `block` (exit non-zero to prevent commit) +- **Two scan scopes**: `diff` (modified files vs HEAD) or `staged` (git-staged files only) +- **Smart filtering**: Skips binary files, lock files, and placeholder/example values +- **Allowlist support**: Exclude known false positives via `SECRETS_ALLOWLIST` +- **Structured logging**: JSON Lines output for integration with monitoring tools +- **Redacted output**: Findings are truncated in logs to avoid re-exposing secrets +- **Zero dependencies**: Uses only standard Unix tools (`grep`, `file`, `git`) + +## Installation + +1. Copy the hook folder to your repository: + + ```bash + cp -r hooks/secrets-scanner .github/hooks/ + ``` + +2. Ensure the script is executable: + + ```bash + chmod +x .github/hooks/secrets-scanner/scan-secrets.sh + ``` + +3. Create the logs directory and add it to `.gitignore`: + + ```bash + mkdir -p logs/copilot/secrets + echo "logs/" >> .gitignore + ``` + +4. Commit the hook configuration to your repository's default branch. + +## Configuration + +The hook is configured in `hooks.json` to run on the `sessionEnd` event: + +```json +{ + "version": 1, + "hooks": { + "sessionEnd": [ + { + "type": "command", + "bash": ".github/hooks/secrets-scanner/scan-secrets.sh", + "cwd": ".", + "env": { + "SCAN_MODE": "warn", + "SCAN_SCOPE": "diff" + }, + "timeoutSec": 30 + } + ] + } +} +``` + +### Environment Variables + +| Variable | Values | Default | Description | +|----------|--------|---------|-------------| +| `SCAN_MODE` | `warn`, `block` | `warn` | `warn` logs findings only; `block` exits non-zero to prevent auto-commit | +| `SCAN_SCOPE` | `diff`, `staged` | `diff` | `diff` scans uncommitted changes vs HEAD; `staged` scans only staged files | +| `SKIP_SECRETS_SCAN` | `true` | unset | Disable the scanner entirely | +| `SECRETS_LOG_DIR` | path | `logs/copilot/secrets` | Directory where scan logs are written | +| `SECRETS_ALLOWLIST` | comma-separated | unset | Patterns to ignore (e.g., `test_key_123,example.com`) | + +## How It Works + +1. When a Copilot coding agent session ends, the hook executes +2. Collects all modified files using `git diff` (respects the configured scope) +3. Filters out binary files and lock files +4. Scans each text file line-by-line against 20+ regex patterns for known secret formats +5. Skips matches that look like placeholders (e.g., values containing `example`, `changeme`, `your_`) +6. Checks matches against the allowlist if configured +7. Reports findings with file path, line number, pattern name, and severity +8. Writes a structured JSON log entry for audit purposes +9. In `block` mode, exits non-zero to signal the agent to stop before committing + +## Detected Secret Patterns + +| Pattern | Severity | Example Match | +|---------|----------|---------------| +| `AWS_ACCESS_KEY` | critical | `AKIAIOSFODNN7EXAMPLE` | +| `AWS_SECRET_KEY` | critical | `aws_secret_access_key = wJalr...` | +| `GCP_SERVICE_ACCOUNT` | critical | `"type": "service_account"` | +| `GCP_API_KEY` | high | `AIzaSyC...` | +| `AZURE_CLIENT_SECRET` | critical | `azure_client_secret = ...` | +| `GITHUB_PAT` | critical | `ghp_xxxxxxxxxxxx...` | +| `GITHUB_FINE_GRAINED_PAT` | critical | `github_pat_...` | +| `PRIVATE_KEY` | critical | `-----BEGIN RSA PRIVATE KEY-----` | +| `GENERIC_SECRET` | high | `api_key = "sk-..."` | +| `CONNECTION_STRING` | high | `postgresql://user:pass@host/db` | +| `SLACK_TOKEN` | high | `xoxb-...` | +| `STRIPE_SECRET_KEY` | critical | `sk_live_...` | +| `NPM_TOKEN` | high | `npm_...` | +| `JWT_TOKEN` | medium | `eyJhbGci...` | +| `INTERNAL_IP_PORT` | medium | `192.168.1.1:8080` | + +See the full list in `scan-secrets.sh`. + +## Example Output + +### Clean scan + +``` +🔍 Scanning 5 modified file(s) for secrets... +✅ No secrets detected in 5 scanned file(s) +``` + +### Findings detected (warn mode) + +``` +🔍 Scanning 3 modified file(s) for secrets... + +⚠️ Found 2 potential secret(s) in modified files: + + FILE LINE PATTERN SEVERITY + ---- ---- ------- -------- + src/config.ts 12 GITHUB_PAT critical + .env.local 3 CONNECTION_STRING high + +💡 Review the findings above. Set SCAN_MODE=block to prevent commits with secrets. +``` + +### Findings detected (block mode) + +``` +🔍 Scanning 3 modified file(s) for secrets... + +⚠️ Found 1 potential secret(s) in modified files: + + FILE LINE PATTERN SEVERITY + ---- ---- ------- -------- + lib/auth.py 45 AWS_ACCESS_KEY critical + +🚫 Session blocked: resolve the findings above before committing. + Set SCAN_MODE=warn to log without blocking, or add patterns to SECRETS_ALLOWLIST. +``` + +## Log Format + +Scan events are written to `logs/copilot/secrets/scan.log` in JSON Lines format: + +```json +{"timestamp":"2026-03-13T10:30:00Z","event":"secrets_found","mode":"warn","scope":"diff","files_scanned":3,"finding_count":2,"findings":[{"file":"src/config.ts","line":12,"pattern":"GITHUB_PAT","severity":"critical","match":"ghp_...xyz1"}]} +``` + +```json +{"timestamp":"2026-03-13T10:30:00Z","event":"scan_complete","mode":"warn","scope":"diff","status":"clean","files_scanned":5} +``` + +## Pairing with Other Hooks + +This hook pairs well with the **Session Auto-Commit** hook. When both are installed, order them so that `secrets-scanner` runs first: + +1. Secrets scanner runs at `sessionEnd`, catches leaked secrets +2. Auto-commit runs at `sessionEnd`, only commits if all previous hooks pass + +Set `SCAN_MODE=block` to prevent auto-commit when secrets are detected. + +## Customization + +- **Add custom patterns**: Edit the `PATTERNS` array in `scan-secrets.sh` to add project-specific secret formats +- **Adjust sensitivity**: Change severity levels or remove patterns that generate false positives +- **Allowlist known values**: Use `SECRETS_ALLOWLIST` for test fixtures or known safe patterns +- **Change log location**: Set `SECRETS_LOG_DIR` to route logs to your preferred directory + +## Disabling + +To temporarily disable the scanner: + +- Set `SKIP_SECRETS_SCAN=true` in the hook environment +- Or remove the `sessionEnd` entry from `hooks.json` + +## Limitations + +- Pattern-based detection; does not perform entropy analysis or contextual validation +- May produce false positives for test fixtures or example code (use the allowlist to suppress these) +- Scans only text files; binary secrets (keystores, certificates in DER format) are not detected +- Requires `git` to be available in the execution environment diff --git a/hooks/secrets-scanner/hooks.json b/hooks/secrets-scanner/hooks.json new file mode 100644 index 000000000..1258880c8 --- /dev/null +++ b/hooks/secrets-scanner/hooks.json @@ -0,0 +1,17 @@ +{ + "version": 1, + "hooks": { + "sessionEnd": [ + { + "type": "command", + "bash": ".github/hooks/secrets-scanner/scan-secrets.sh", + "cwd": ".", + "env": { + "SCAN_MODE": "warn", + "SCAN_SCOPE": "diff" + }, + "timeoutSec": 30 + } + ] + } +} diff --git a/hooks/secrets-scanner/scan-secrets.sh b/hooks/secrets-scanner/scan-secrets.sh new file mode 100755 index 000000000..c8c77d8a1 --- /dev/null +++ b/hooks/secrets-scanner/scan-secrets.sh @@ -0,0 +1,269 @@ +#!/bin/bash + +# Secrets Scanner Hook +# Scans files modified during a Copilot coding agent session for accidentally +# leaked secrets, credentials, and sensitive data before they are committed. +# +# Environment variables: +# SCAN_MODE - "warn" (log only) or "block" (exit non-zero on findings) (default: warn) +# SCAN_SCOPE - "diff" (changed files only) or "staged" (staged files) (default: diff) +# SKIP_SECRETS_SCAN - "true" to disable scanning entirely (default: unset) +# SECRETS_LOG_DIR - Directory for scan logs (default: logs/copilot/secrets) +# SECRETS_ALLOWLIST - Comma-separated list of patterns to ignore (default: unset) + +set -euo pipefail + +if [[ "${SKIP_SECRETS_SCAN:-}" == "true" ]]; then + echo "⏭️ Secrets scan skipped (SKIP_SECRETS_SCAN=true)" + exit 0 +fi + +# Ensure we are in a git repository +if ! git rev-parse --is-inside-work-tree &>/dev/null; then + echo "⚠️ Not in a git repository, skipping secrets scan" + exit 0 +fi + +MODE="${SCAN_MODE:-warn}" +SCOPE="${SCAN_SCOPE:-diff}" +LOG_DIR="${SECRETS_LOG_DIR:-logs/copilot/secrets}" +TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ") +FINDING_COUNT=0 + +mkdir -p "$LOG_DIR" +LOG_FILE="$LOG_DIR/scan.log" + +# Collect files to scan based on scope +FILES=() +if [[ "$SCOPE" == "staged" ]]; then + while IFS= read -r f; do + [[ -n "$f" ]] && FILES+=("$f") + done < <(git diff --cached --name-only --diff-filter=ACMR 2>/dev/null) +else + while IFS= read -r f; do + [[ -n "$f" ]] && FILES+=("$f") + done < <(git diff --name-only --diff-filter=ACMR HEAD 2>/dev/null || git diff --name-only --diff-filter=ACMR 2>/dev/null) + # Also include untracked new files (created during the session, not yet in HEAD) + while IFS= read -r f; do + [[ -n "$f" ]] && FILES+=("$f") + done < <(git ls-files --others --exclude-standard 2>/dev/null) +fi + +if [[ ${#FILES[@]} -eq 0 ]]; then + echo "✨ No modified files to scan" + printf '{"timestamp":"%s","event":"scan_complete","mode":"%s","scope":"%s","status":"clean","files_scanned":0}\n' \ + "$TIMESTAMP" "$MODE" "$SCOPE" >> "$LOG_FILE" + exit 0 +fi + +# Parse allowlist into an array +ALLOWLIST=() +if [[ -n "${SECRETS_ALLOWLIST:-}" ]]; then + IFS=',' read -ra ALLOWLIST <<< "$SECRETS_ALLOWLIST" +fi + +is_allowlisted() { + local match="$1" + for pattern in "${ALLOWLIST[@]}"; do + pattern=$(printf '%s' "$pattern" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//') + [[ -z "$pattern" ]] && continue + if [[ "$match" == *"$pattern"* ]]; then + return 0 + fi + done + return 1 +} + +# Binary file detection: skip files that are not text +is_text_file() { + local filepath="$1" + [[ -f "$filepath" ]] && file --brief --mime-type "$filepath" 2>/dev/null | grep -q "^text/" && return 0 + # Fallback: check common text extensions + case "$filepath" in + *.md|*.txt|*.json|*.yaml|*.yml|*.xml|*.toml|*.ini|*.cfg|*.conf|\ + *.sh|*.bash|*.zsh|*.ps1|*.bat|*.cmd|\ + *.py|*.rb|*.js|*.ts|*.jsx|*.tsx|*.go|*.rs|*.java|*.kt|*.cs|*.cpp|*.c|*.h|\ + *.php|*.swift|*.scala|*.r|*.R|*.lua|*.pl|*.ex|*.exs|*.hs|*.ml|\ + *.html|*.css|*.scss|*.less|*.svg|\ + *.sql|*.graphql|*.proto|\ + *.env|*.env.*|*.properties|\ + Dockerfile*|Makefile*|Vagrantfile|Gemfile|Rakefile) + return 0 ;; + *) + return 1 ;; + esac +} + +# Secret detection patterns +# Each entry: "PATTERN_NAME|SEVERITY|REGEX" +# Severity levels: critical, high, medium +PATTERNS=( + # Cloud provider credentials + "AWS_ACCESS_KEY|critical|AKIA[0-9A-Z]{16}" + "AWS_SECRET_KEY|critical|aws_secret_access_key[[:space:]]*[:=][[:space:]]*['\"]?[A-Za-z0-9/+=]{40}" + "GCP_SERVICE_ACCOUNT|critical|\"type\"[[:space:]]*:[[:space:]]*\"service_account\"" + "GCP_API_KEY|high|AIza[0-9A-Za-z_-]{35}" + "AZURE_CLIENT_SECRET|critical|azure[_-]?client[_-]?secret[[:space:]]*[:=][[:space:]]*['\"]?[A-Za-z0-9_~.-]{34,}" + + # GitHub tokens + "GITHUB_PAT|critical|ghp_[0-9A-Za-z]{36}" + "GITHUB_OAUTH|critical|gho_[0-9A-Za-z]{36}" + "GITHUB_APP_TOKEN|critical|ghs_[0-9A-Za-z]{36}" + "GITHUB_REFRESH_TOKEN|critical|ghr_[0-9A-Za-z]{36}" + "GITHUB_FINE_GRAINED_PAT|critical|github_pat_[0-9A-Za-z_]{82}" + + # Private keys + "PRIVATE_KEY|critical|-----BEGIN (RSA |EC |OPENSSH |DSA |PGP )?PRIVATE KEY-----" + "PGP_PRIVATE_BLOCK|critical|-----BEGIN PGP PRIVATE KEY BLOCK-----" + + # Generic secrets and tokens + "GENERIC_SECRET|high|(secret|token|password|passwd|pwd|api[_-]?key|apikey|access[_-]?key|auth[_-]?token|client[_-]?secret)[[:space:]]*[:=][[:space:]]*['\"]?[A-Za-z0-9_/+=~.-]{8,}" + "CONNECTION_STRING|high|(mongodb(\\+srv)?|postgres(ql)?|mysql|redis|amqp|mssql)://[^[:space:]'\"]{10,}" + "BEARER_TOKEN|medium|[Bb]earer[[:space:]]+[A-Za-z0-9_-]{20,}\.[A-Za-z0-9_-]{20,}" + + # Messaging and SaaS tokens + "SLACK_TOKEN|high|xox[baprs]-[0-9]{10,}-[0-9A-Za-z-]+" + "SLACK_WEBHOOK|high|https://hooks\.slack\.com/services/T[0-9A-Z]{8,}/B[0-9A-Z]{8,}/[0-9A-Za-z]{24}" + "DISCORD_TOKEN|high|[MN][A-Za-z0-9]{23,}\.[A-Za-z0-9_-]{6}\.[A-Za-z0-9_-]{27,}" + "TWILIO_API_KEY|high|SK[0-9a-fA-F]{32}" + "SENDGRID_API_KEY|high|SG\.[0-9A-Za-z_-]{22}\.[0-9A-Za-z_-]{43}" + "STRIPE_SECRET_KEY|critical|sk_live_[0-9A-Za-z]{24,}" + "STRIPE_RESTRICTED_KEY|high|rk_live_[0-9A-Za-z]{24,}" + + # npm tokens + "NPM_TOKEN|high|npm_[0-9A-Za-z]{36}" + + # JWT (long, structured tokens) + "JWT_TOKEN|medium|eyJ[A-Za-z0-9_-]{10,}\.eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}" + + # IP addresses with ports (possible internal services) + "INTERNAL_IP_PORT|medium|(^|[^.0-9])(10\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}|172\.(1[6-9]|2[0-9]|3[01])\.[0-9]{1,3}\.[0-9]{1,3}|192\.168\.[0-9]{1,3}\.[0-9]{1,3}):[0-9]{2,5}([^0-9]|$)" +) + +# Escape a string value for safe embedding in a JSON string literal +json_escape() { + printf '%s' "$1" | sed 's/\\/\\\\/g; s/"/\\"/g' +} + +# Store findings as tab-separated records +FINDINGS=() + +scan_file() { + local filepath="$1" + # read_path: the actual file to scan; defaults to filepath (working tree) + # When SCOPE=staged, callers pass a temp file with the staged content instead + local read_path="${2:-$1}" + + # Skip if source does not exist (e.g., deleted) + [[ -f "$read_path" ]] || return 0 + + # Skip binary files (type detection uses the original path for MIME lookup) + if ! is_text_file "$filepath"; then + return 0 + fi + + # Skip common non-sensitive files + case "$filepath" in + *.lock|package-lock.json|yarn.lock|pnpm-lock.yaml|Cargo.lock|go.sum|*.sum) + return 0 ;; + esac + + for entry in "${PATTERNS[@]}"; do + IFS='|' read -r pattern_name severity regex <<< "$entry" + + while IFS=: read -r line_num matched_line; do + # Extract the matched fragment + local match + match=$(printf '%s\n' "$matched_line" | grep -oE "$regex" 2>/dev/null | head -1) + [[ -z "$match" ]] && continue + + # Strip boundary characters from IP:port matches + if [[ "$pattern_name" == "INTERNAL_IP_PORT" ]]; then + match=$(printf '%s' "$match" | grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+:[0-9]+') + [[ -z "$match" ]] && continue + fi + + # Check allowlist + if [[ ${#ALLOWLIST[@]} -gt 0 ]] && is_allowlisted "$match"; then + continue + fi + + # Skip if this looks like a placeholder or example + if printf '%s\n' "$match" | grep -qiE '(example|placeholder|your[_-]|xxx|changeme|TODO|FIXME|replace[_-]?me|dummy|fake|test[_-]?key|sample)'; then + continue + fi + + # Redact the match for safe logging: show first 4 and last 4 chars + local redacted + if [[ ${#match} -le 12 ]]; then + redacted="[REDACTED]" + else + redacted="${match:0:4}...${match: -4}" + fi + + FINDINGS+=("$filepath $line_num $pattern_name $severity $redacted") + FINDING_COUNT=$((FINDING_COUNT + 1)) + done < <(grep -nE "$regex" "$read_path" 2>/dev/null || true) + done +} + +echo "🔍 Scanning ${#FILES[@]} modified file(s) for secrets..." + +for filepath in "${FILES[@]}"; do + if [[ "$SCOPE" == "staged" ]]; then + # Scan the staged (index) version to match what will actually be committed + _tmpfile=$(mktemp) + git show :"$filepath" > "$_tmpfile" 2>/dev/null || true + scan_file "$filepath" "$_tmpfile" + rm -f "$_tmpfile" + else + scan_file "$filepath" + fi +done + +# Log results +if [[ $FINDING_COUNT -gt 0 ]]; then + echo "" + echo "⚠️ Found $FINDING_COUNT potential secret(s) in modified files:" + echo "" + printf " %-40s %-6s %-28s %s\n" "FILE" "LINE" "PATTERN" "SEVERITY" + printf " %-40s %-6s %-28s %s\n" "----" "----" "-------" "--------" + + # Build JSON findings array and print table + FINDINGS_JSON="[" + FIRST=true + for finding in "${FINDINGS[@]}"; do + IFS=$'\t' read -r fpath fline pname psev redacted <<< "$finding" + + printf " %-40s %-6s %-28s %s\n" "$fpath" "$fline" "$pname" "$psev" + + if [[ "$FIRST" != "true" ]]; then + FINDINGS_JSON+="," + fi + FIRST=false + + # Build JSON safely without requiring jq; escape path and match values + FINDINGS_JSON+="{\"file\":\"$(json_escape "$fpath")\",\"line\":$fline,\"pattern\":\"$pname\",\"severity\":\"$psev\",\"match\":\"$(json_escape "$redacted")\"}" + done + FINDINGS_JSON+="]" + + echo "" + + # Write structured log entry + printf '{"timestamp":"%s","event":"secrets_found","mode":"%s","scope":"%s","files_scanned":%d,"finding_count":%d,"findings":%s}\n' \ + "$TIMESTAMP" "$MODE" "$SCOPE" "${#FILES[@]}" "$FINDING_COUNT" "$FINDINGS_JSON" >> "$LOG_FILE" + + if [[ "$MODE" == "block" ]]; then + echo "🚫 Session blocked: resolve the findings above before committing." + echo " Set SCAN_MODE=warn to log without blocking, or add patterns to SECRETS_ALLOWLIST." + exit 1 + else + echo "💡 Review the findings above. Set SCAN_MODE=block to prevent commits with secrets." + fi +else + echo "✅ No secrets detected in ${#FILES[@]} scanned file(s)" + printf '{"timestamp":"%s","event":"scan_complete","mode":"%s","scope":"%s","status":"clean","files_scanned":%d}\n' \ + "$TIMESTAMP" "$MODE" "$SCOPE" "${#FILES[@]}" >> "$LOG_FILE" +fi + +exit 0