Add continuous graph updates via Git webhook and poll watcher#615
Add continuous graph updates via Git webhook and poll watcher#615
Conversation
- Add api/git_utils/incremental_update.py with incremental_update(), fetch_remote(), get_remote_head(), and repo_local_path() helpers - Export new functions from api/git_utils/__init__.py - Add POST /api/webhook endpoint with HMAC-SHA256 validation, branch filtering, and repo URL matching - Add background poll watcher via FastAPI lifespan (_poll_loop, _poll_all_repos, _poll_repo) - Add WEBHOOK_SECRET, TRACKED_BRANCH, POLL_INTERVAL env vars - Document new env vars in .env.template - Add tests/test_webhook.py with unit tests" Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
| logger.exception( | ||
| "Webhook: incremental update failed for '%s': %s", repo_name, exc | ||
| ) | ||
| return JSONResponse({"status": "error", "detail": str(exc)}, status_code=500) |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, the fix is to avoid sending raw exception details back to the client. Instead, log the full exception on the server (which is already being done via logger.exception) and return a generic, non-sensitive error message (and optionally a simple error code) in the HTTP response. This preserves current behavior in terms of status codes and success/error structure, but eliminates leakage of internal details.
Concretely in api/index.py, within the webhook function’s try/except block, we should change the JSONResponse returned in the except clause at line 679 so that it no longer contains str(exc) in the detail field. A suitable replacement is a constant, generic message such as "Internal server error during webhook processing". Because logger.exception already records the stack trace and exception message, no extra logging changes are needed. No new imports or helpers are required; we only adjust the literal string in the JSON payload.
| @@ -676,7 +676,13 @@ | ||
| logger.exception( | ||
| "Webhook: incremental update failed for '%s': %s", repo_name, exc | ||
| ) | ||
| return JSONResponse({"status": "error", "detail": str(exc)}, status_code=500) | ||
| return JSONResponse( | ||
| { | ||
| "status": "error", | ||
| "detail": "Internal server error during webhook processing", | ||
| }, | ||
| status_code=500, | ||
| ) | ||
|
|
||
| return {"status": "success", **result} | ||
|
|
Resolve PR #615 conflicts, preserve the continuous update configuration, and stabilize the webhook incremental-update tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
|
Important Review skippedBot user detected. To trigger a single review, invoke the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request implements continuous incremental graph updates via webhooks and background polling. It adds configuration for webhook secrets, branch tracking, and polling intervals; introduces the incremental update engine for diff-based graph synchronization; exposes a POST /api/webhook endpoint with HMAC-SHA256 signature validation; and includes a background poll-watcher task with comprehensive test coverage. Changes
Sequence Diagram(s)sequenceDiagram
actor Client as Push Event/<br/>Polling Loop
participant API as FastAPI Server<br/>/api/webhook
participant Repo as Git Repository
participant Analyzer as Source Analyzer
participant Graph as FalkorDB<br/>Graph Database
participant Redis as Redis<br/>(Bookmark)
alt Webhook Path
Client->>API: POST /api/webhook<br/>(with signature & payload)
API->>API: Validate HMAC-SHA256<br/>Verify branch match
API->>Repo: Extract repo from payload
else Polling Path
Client->>API: Background poll-watcher<br/>triggers periodically
API->>Repo: Fetch remote & check HEAD
end
API->>Repo: Resolve from_sha, to_sha<br/>(current & latest)
Repo->>API: Return commit objects
API->>Repo: Compute file diff<br/>(added, modified, deleted)
Repo->>API: Return file changeset
API->>Repo: Checkout target commit
Repo->>Repo: Update working tree
Note over Graph,Analyzer: Process Changed Files
par Remove Deleted/Modified
API->>Graph: DELETE nodes & edges<br/>for removed/modified files
and Analyze & Insert Changed
API->>Analyzer: Analyze added/<br/>modified files
Analyzer->>API: Return AST/symbols
API->>Graph: INSERT new nodes/<br/>UPDATE existing edges
end
Graph->>Graph: Return change summary<br/>(added, modified, deleted counts)
API->>Redis: Persist new commit SHA<br/>as bookmark
Redis->>API: Acknowledge
API->>Client: Return 200 with<br/>update summary
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR adds a continuous/incremental graph update mechanism to avoid full re-indexes on every repo change, integrating both a push-webhook trigger and a background poll-watcher into the FastAPI backend.
Changes:
- Added
api/git_utils/incremental_update.pyto compute file-level diffs between two SHAs and update the FalkorDB graph incrementally while persisting the new commit bookmark in Redis. - Added
POST /api/webhookplus URL-matching helpers and FastAPI lifespan-managed poll-watcher to trigger incremental updates automatically. - Documented new environment variables and the new webhook endpoint in
.env.templateandREADME.md, plus added unit tests for the webhook and incremental-update helpers.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
api/index.py |
Adds webhook endpoint, URL matching, poll-watcher loop, and lifespan wiring to trigger incremental updates. |
api/git_utils/incremental_update.py |
Implements diff-based incremental update flow (checkout, delete stale file nodes, re-analyze changed files, update Redis bookmark). |
api/git_utils/__init__.py |
Replaces wildcard export with explicit exports, including the new incremental update helpers. |
tests/test_webhook.py |
Adds unit tests for URL matching, webhook behavior (open/secured), and basic incremental_update edge cases. |
README.md |
Documents new env vars and the /api/webhook endpoint. |
.env.template |
Adds WEBHOOK_SECRET, TRACKED_BRANCH, and POLL_INTERVAL configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.info( | ||
| "Poll: new commits detected for '%s' (%s -> %s), updating …", | ||
| repo_name, current_sha, remote_head, | ||
| ) | ||
| try: | ||
| result = incremental_update(repo_name, current_sha, remote_head) | ||
| logger.info("Poll: '%s' updated — %s", repo_name, result) | ||
| except Exception as exc: |
| def _update() -> dict: | ||
| path = repo_local_path(repo_name) | ||
| if path.exists(): | ||
| fetch_remote(path) |
| """Receive a GitHub/GitLab push event and trigger an incremental graph update. | ||
|
|
||
| When ``WEBHOOK_SECRET`` is set the endpoint validates the | ||
| ``X-Hub-Signature-256`` header using HMAC-SHA256; requests with a missing | ||
| or invalid signature are rejected with **401 Unauthorized**. | ||
|
|
||
| Only pushes to the branch configured via ``TRACKED_BRANCH`` (default | ||
| ``main``) trigger an update; pushes to other branches are acknowledged | ||
| with a ``200 ignored`` response so that GitHub does not retry them. | ||
|
|
||
| The repository is identified by matching the ``repository.clone_url`` | ||
| field in the payload against the URLs stored for already-indexed | ||
| repositories. | ||
| """ | ||
| body = await request.body() | ||
|
|
||
| # Validate HMAC-SHA256 signature when a secret is configured | ||
| if WEBHOOK_SECRET: | ||
| sig_header = request.headers.get("X-Hub-Signature-256", "") | ||
| mac = hmac.new(WEBHOOK_SECRET.encode(), body, hashlib.sha256) | ||
| expected_sig = "sha256=" + mac.hexdigest() | ||
| if not hmac.compare_digest(sig_header, expected_sig): | ||
| raise HTTPException(status_code=401, detail="Invalid webhook signature") | ||
|
|
||
| try: | ||
| payload = await request.json() | ||
| except Exception: | ||
| raise HTTPException(status_code=400, detail="Invalid JSON payload") | ||
|
|
||
| ref = payload.get("ref", "") | ||
| before = payload.get("before", "") | ||
| after = payload.get("after", "") | ||
| repo_url = payload.get("repository", {}).get("clone_url", "") | ||
|
|
||
| # Only process pushes to the configured tracked branch | ||
| expected_ref = f"refs/heads/{TRACKED_BRANCH}" | ||
| if ref != expected_ref: | ||
| logger.debug("Webhook: ignoring push to '%s' (tracking '%s')", ref, expected_ref) | ||
| return {"status": "ignored", "reason": f"Branch not tracked: {ref}"} | ||
|
|
||
| if not before or not after or not repo_url: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="Payload missing required fields: ref, before, after, repository.clone_url", | ||
| ) |
| if from_sha == to_sha: | ||
| logger.info( | ||
| "incremental_update: from_sha == to_sha (%s); nothing to do", from_sha | ||
| ) | ||
| return { | ||
| "files_added": 0, | ||
| "files_modified": 0, | ||
| "files_deleted": 0, | ||
| "commit": to_sha, | ||
| } |
| def _poll_repo(repo_name: str) -> None: | ||
| """Fetch remote and apply incremental updates for *repo_name* if behind. | ||
|
|
||
| This function is intentionally synchronous so it can be safely offloaded | ||
| to ``asyncio``'s default ``ThreadPoolExecutor``. | ||
| """ | ||
| path = repo_local_path(repo_name) | ||
| if not path.exists(): | ||
| logger.debug("Poll: local clone not found for '%s', skipping", repo_name) | ||
| return | ||
|
|
||
| try: | ||
| fetch_remote(path) | ||
| except Exception as exc: | ||
| logger.warning("Poll: git fetch failed for '%s': %s", repo_name, exc) | ||
| return | ||
|
|
||
| remote_head = get_remote_head(path, TRACKED_BRANCH) | ||
| if not remote_head: | ||
| return | ||
|
|
||
| current_sha = get_repo_commit(repo_name) | ||
| if not current_sha: | ||
| logger.debug("Poll: no stored commit for '%s', skipping", repo_name) | ||
| return | ||
|
|
||
| # Handle comparison between short (7-char) and full (40-char) SHAs: a short | ||
| # stored SHA is a valid prefix of a full remote SHA for the same commit. | ||
| # We only apply prefix matching when the stored SHA is shorter. | ||
| if len(current_sha) < len(remote_head): | ||
| up_to_date = remote_head.startswith(current_sha) | ||
| elif len(current_sha) > len(remote_head): | ||
| up_to_date = current_sha.startswith(remote_head) | ||
| else: | ||
| up_to_date = current_sha == remote_head | ||
| if up_to_date: | ||
| logger.debug("Poll: '%s' is up-to-date at %s", repo_name, current_sha) | ||
| return | ||
|
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/git_utils/incremental_update.py`:
- Around line 172-180: The current update deletes definitions for
deleted+modified files via g.delete_files and re-analyzes only added+modified
via analyzer.analyze_files, which leaves inbound callers/importers stale; after
computing files_to_add and files_to_remove, collect the transitive set of
dependent files that import or call those changed files (using the graph g's
reverse edges / dependency lookup), add those dependent filenames to
files_to_add (excluding already-deleted files), and then call
analyzer.analyze_files on this expanded files_to_add so
SourceAnalyzer.analyze_files reprocesses untouched callers and restores inbound
edges; ensure you use g's dependency/query methods (the graph instance g) to
find dependents before calling analyzer.analyze_files.
- Around line 166-185: Wrap the entire mutation sequence that performs
repo.checkout_tree/to_commit set_head_detached, Graph(repo_name) updates,
analyzer.analyze_files(...) and set_repo_commit(...) in an exclusive repo-scoped
lock (keyed on repo_name) so concurrent runs for the same repo cannot
interleave; acquire the lock before calling repo.checkout_tree and hold it until
after set_repo_commit (or until commit of all graph changes), use a
distributed-lock primitive if you have multiple workers, set a sensible timeout
and ensure the lock is always released in a finally/cleanup block and that
errors are logged/propagated while still releasing the lock.
In `@api/index.py`:
- Around line 497-505: The code calls incremental_update(repo_name, before,
after) using the webhook payload.before; instead, retrieve the stored bookmark
via get_repo_commit(repo_name) and use that as from_sha when calling
incremental_update (i.e., pass get_repo_commit(repo_name) as the first/from
argument), but detect mismatch: if the stored bookmark is missing or does not
equal the graph's current commit that lines up with the push history (or cannot
reach payload.before), fall back to performing a full reindex of the repo (call
the existing full reindex routine) rather than running a partial
incremental_update; update the _update closure (and its call site around
repo_local_path, fetch_remote, and loop.run_in_executor) to implement this
branching logic so the graph never advances past gaps.
- Around line 114-122: The webhook endpoint is anonymously writable because
WEBHOOK_SECRET defaults to empty; update startup and the /api/webhook handler so
webhook auth is mandatory: at startup (when reading WEBHOOK_SECRET) fail fast
with a clear error if it's empty in production mode, or modify the webhook route
handler to apply token_required when WEBHOOK_SECRET is not set (use the existing
token_required decorator) so the mutating endpoint is never unauthenticated;
ensure you update the webhook handler (the /api/webhook function) to prefer HMAC
verification when WEBHOOK_SECRET is set and fallback to token_required
otherwise, and add a clear log message indicating which auth mode is in effect.
- Around line 455-470: The webhook handler currently enforces GitHub-specific
headers and payload fields (it reads X-Hub-Signature-256 into sig_header, builds
expected_sig from WEBHOOK_SECRET, and reads repository.clone_url into repo_url),
which rejects GitLab webhooks; update the logic in the handler to detect GitLab
deliveries by checking for X-Gitlab-Token or X-Gitlab-Signature when
X-Hub-Signature-256 is absent, validate the token/signature using the configured
secret (respecting GitLab’s verification method), and when parsing payload fall
back to repository.git_http_url or project.git_http_url if repository.clone_url
is missing; alternatively, if you prefer to keep GitHub-only behavior, update
documentation to state the webhook supports GitHub only and explicitly fail with
a clear message when GitLab headers are present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cea312f7-19cf-400b-a510-c28520dca5e1
📒 Files selected for processing (6)
.env.templateREADME.mdapi/git_utils/__init__.pyapi/git_utils/incremental_update.pyapi/index.pytests/test_webhook.py
api/git_utils/incremental_update.py
Outdated
| repo.checkout_tree(to_commit.tree, strategy=CheckoutStrategy.FORCE) | ||
| repo.set_head_detached(to_commit.id) | ||
|
|
||
| # Apply graph changes | ||
| g = Graph(repo_name) | ||
|
|
||
| files_to_remove = deleted + modified | ||
| if files_to_remove: | ||
| logger.info("Removing %d file(s) from graph", len(files_to_remove)) | ||
| g.delete_files(files_to_remove) | ||
|
|
||
| files_to_add = added + modified | ||
| if files_to_add: | ||
| logger.info("Inserting/updating %d file(s) in graph", len(files_to_add)) | ||
| analyzer.analyze_files(files_to_add, repo_path, g) | ||
|
|
||
| # Persist the new commit bookmark using the short ID for consistency | ||
| # with the rest of the system (build_commit_graph, analyze_sources …) | ||
| new_commit_short = to_commit.short_id | ||
| set_repo_commit(repo_name, new_commit_short) |
There was a problem hiding this comment.
Guard the entire update with a repo-scoped lock.
This function force-checks out to_sha, mutates graph state, and then advances the Redis bookmark. api/index.py can reach it from both the poll watcher and /api/webhook, so overlapping runs for the same repo can interleave and leave the checkout, graph, and bookmark describing different commits. Wrap the whole mutation in an exclusive repo-scoped lock here (or a distributed lock if you run multiple workers).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/git_utils/incremental_update.py` around lines 166 - 185, Wrap the entire
mutation sequence that performs repo.checkout_tree/to_commit set_head_detached,
Graph(repo_name) updates, analyzer.analyze_files(...) and set_repo_commit(...)
in an exclusive repo-scoped lock (keyed on repo_name) so concurrent runs for the
same repo cannot interleave; acquire the lock before calling repo.checkout_tree
and hold it until after set_repo_commit (or until commit of all graph changes),
use a distributed-lock primitive if you have multiple workers, set a sensible
timeout and ensure the lock is always released in a finally/cleanup block and
that errors are logged/propagated while still releasing the lock.
api/git_utils/incremental_update.py
Outdated
| files_to_remove = deleted + modified | ||
| if files_to_remove: | ||
| logger.info("Removing %d file(s) from graph", len(files_to_remove)) | ||
| g.delete_files(files_to_remove) | ||
|
|
||
| files_to_add = added + modified | ||
| if files_to_add: | ||
| logger.info("Inserting/updating %d file(s) in graph", len(files_to_add)) | ||
| analyzer.analyze_files(files_to_add, repo_path, g) |
There was a problem hiding this comment.
Recompute reverse dependency edges, not just the changed files.
g.delete_files(deleted + modified) removes the old definitions and their relationships, but analyzer.analyze_files(files_to_add, repo_path, g) only revisits added + modified. From the provided SourceAnalyzer.analyze_files() snippet, untouched callers/importers never get reprocessed, so inbound cross-file edges to the changed files stay missing or stale after the update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/git_utils/incremental_update.py` around lines 172 - 180, The current
update deletes definitions for deleted+modified files via g.delete_files and
re-analyzes only added+modified via analyzer.analyze_files, which leaves inbound
callers/importers stale; after computing files_to_add and files_to_remove,
collect the transitive set of dependent files that import or call those changed
files (using the graph g's reverse edges / dependency lookup), add those
dependent filenames to files_to_add (excluding already-deleted files), and then
call analyzer.analyze_files on this expanded files_to_add so
SourceAnalyzer.analyze_files reprocesses untouched callers and restores inbound
edges; ensure you use g's dependency/query methods (the graph instance g) to
find dependents before calling analyzer.analyze_files.
| # HMAC-SHA256 secret shared with GitHub/GitLab. Leave unset to skip | ||
| # signature validation (not recommended for production). | ||
| WEBHOOK_SECRET: str = os.getenv("WEBHOOK_SECRET", "") | ||
|
|
||
| # Branch whose pushes trigger incremental graph updates. | ||
| TRACKED_BRANCH: str = os.getenv("TRACKED_BRANCH", "main") | ||
|
|
||
| # Seconds between automatic poll checks (0 = disabled). | ||
| POLL_INTERVAL: int = int(os.getenv("POLL_INTERVAL", "60")) |
There was a problem hiding this comment.
Do not leave /api/webhook anonymously writable by default.
WEBHOOK_SECRET defaults to empty, and this mutating route does not use token_required, so a stock deployment exposes an unauthenticated endpoint that can fetch repos and rewrite graph state. Make webhook auth mandatory at startup, or fall back to token_required when webhook signing is not configured. As per coding guidelines "api/index.py: Use token_required decorator for mutating endpoint authorization".
Also applies to: 436-460
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/index.py` around lines 114 - 122, The webhook endpoint is anonymously
writable because WEBHOOK_SECRET defaults to empty; update startup and the
/api/webhook handler so webhook auth is mandatory: at startup (when reading
WEBHOOK_SECRET) fail fast with a clear error if it's empty in production mode,
or modify the webhook route handler to apply token_required when WEBHOOK_SECRET
is not set (use the existing token_required decorator) so the mutating endpoint
is never unauthenticated; ensure you update the webhook handler (the
/api/webhook function) to prefer HMAC verification when WEBHOOK_SECRET is set
and fallback to token_required otherwise, and add a clear log message indicating
which auth mode is in effect.
api/index.py
Outdated
| if WEBHOOK_SECRET: | ||
| sig_header = request.headers.get("X-Hub-Signature-256", "") | ||
| mac = hmac.new(WEBHOOK_SECRET.encode(), body, hashlib.sha256) | ||
| expected_sig = "sha256=" + mac.hexdigest() | ||
| if not hmac.compare_digest(sig_header, expected_sig): | ||
| raise HTTPException(status_code=401, detail="Invalid webhook signature") | ||
|
|
||
| try: | ||
| payload = await request.json() | ||
| except Exception: | ||
| raise HTTPException(status_code=400, detail="Invalid JSON payload") | ||
|
|
||
| ref = payload.get("ref", "") | ||
| before = payload.get("before", "") | ||
| after = payload.get("after", "") | ||
| repo_url = payload.get("repository", {}).get("clone_url", "") |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitLab push webhook secret header name and repository URL fields in the push payload. Does GitLab send X-Hub-Signature-256andrepository.clone_url, or does it use X-Gitlab-Tokenplusrepository.git_http_url/project.git_http_url?
💡 Result:
-
Secret/verification header: GitLab’s configured Secret token is sent in the
X-Gitlab-TokenHTTP header (notX-Hub-Signature-256, which is GitHub’s). [1]- Separately, GitLab has/introduced webhook signing headers
X-Gitlab-SignatureandX-Gitlab-Signature-Timestamp(HMAC-SHA256), but these are GitLab-specific headers—still notX-Hub-Signature-256. [2]
- Separately, GitLab has/introduced webhook signing headers
-
Repository URL fields in Push payload: GitLab’s push payload includes
repository.git_http_url/repository.git_ssh_urland also aprojectobject withproject.git_http_url/project.git_ssh_url(and notes olderproject.http_urlis deprecated). It does not userepository.clone_url(that’s a GitHub-style field). [3]
Sources
[1] GitLab Docs — Webhooks (Secret token header X-Gitlab-Token) (docs.gitlab.com)
[2] GitLab issue — webhook signing headers X-Gitlab-Signature / X-Gitlab-Signature-Timestamp (gitlab.com)
[3] GitLab Docs — Webhook events (Push event payload fields incl. repository.git_http_url, project.git_http_url, deprecations) (docs.gitlab.com)
Citations:
- 1: https://docs.gitlab.com/user/project/integrations/webhooks/
- 2: https://gitlab.com/gitlab-org/gitlab/-/issues/480633?utm_source=openai
- 3: https://docs.gitlab.com/user/project/integrations/webhook_events/
Webhook handler only supports GitHub, not GitLab as documented.
The code checks for the X-Hub-Signature-256 header and reads repository.clone_url, which are GitHub-specific. GitLab webhooks use X-Gitlab-Token or X-Gitlab-Signature headers and provide repository URLs in repository.git_http_url or project.git_http_url. This will cause GitLab deliveries to be rejected or mis-parsed. Either add GitLab-specific logic here or update documentation to reflect GitHub-only support.
🧰 Tools
🪛 Ruff (0.15.5)
[warning] 464-464: Do not catch blind exception: Exception
(BLE001)
[warning] 465-465: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/index.py` around lines 455 - 470, The webhook handler currently enforces
GitHub-specific headers and payload fields (it reads X-Hub-Signature-256 into
sig_header, builds expected_sig from WEBHOOK_SECRET, and reads
repository.clone_url into repo_url), which rejects GitLab webhooks; update the
logic in the handler to detect GitLab deliveries by checking for X-Gitlab-Token
or X-Gitlab-Signature when X-Hub-Signature-256 is absent, validate the
token/signature using the configured secret (respecting GitLab’s verification
method), and when parsing payload fall back to repository.git_http_url or
project.git_http_url if repository.clone_url is missing; alternatively, if you
prefer to keep GitHub-only behavior, update documentation to state the webhook
supports GitHub only and explicitly fail with a clear message when GitLab
headers are present.
| def _update() -> dict: | ||
| path = repo_local_path(repo_name) | ||
| if path.exists(): | ||
| fetch_remote(path) | ||
| return incremental_update(repo_name, before, after) | ||
|
|
||
| loop = asyncio.get_running_loop() | ||
| try: | ||
| result = await loop.run_in_executor(None, _update) |
There was a problem hiding this comment.
Use the stored repo bookmark as from_sha.
incremental_update() assumes from_sha matches the graph's current commit, but the webhook path always forwards payload.before. If the service missed an earlier delivery or restarted behind, the stored bookmark can lag behind before, and this update will skip the missing diff range while still moving the graph to after. Read get_repo_commit(repo_name) here and fall back to a full reindex when it does not line up with the push event.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/index.py` around lines 497 - 505, The code calls
incremental_update(repo_name, before, after) using the webhook payload.before;
instead, retrieve the stored bookmark via get_repo_commit(repo_name) and use
that as from_sha when calling incremental_update (i.e., pass
get_repo_commit(repo_name) as the first/from argument), but detect mismatch: if
the stored bookmark is missing or does not equal the graph's current commit that
lines up with the push history (or cannot reach payload.before), fall back to
performing a full reindex of the repo (call the existing full reindex routine)
rather than running a partial incremental_update; update the _update closure
(and its call site around repo_local_path, fetch_remote, and
loop.run_in_executor) to implement this branching logic so the graph never
advances past gaps.
Reprocess dependent files during incremental updates, add repo-scoped update locking, harden webhook auth and provider handling, and fall back to full reindex when the stored bookmark no longer matches incoming history. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Full re-indexing on every change is too slow for large repos. This adds an incremental update engine that computes a diff between two commit SHAs and only touches affected files, plus two trigger modes: a GitHub/GitLab push webhook and a background poll watcher.
Core engine —
api/git_utils/incremental_update.pyincremental_update(repo_name, from_sha, to_sha, ignore=[])— resolves both SHAs via pygit2, classifies file changes (added/modified/deleted), checks outto_sha, deletes stale nodes/edges, re-analyses changed files, and persists the new commit bookmark in Redis. Idempotent (from_sha == to_shais a no-op). Accepts abbreviated or full SHAs.fetch_remote(repo_path)—git fetch originvia subprocessget_remote_head(repo_path, branch)— returns remote branch HEAD SHArepo_local_path(repo_name)— resolves clone path; respectsREPOSITORIES_DIRenv overrideWebhook endpoint —
POST /api/webhookAccepts GitHub/GitLab push event payloads. When
WEBHOOK_SECRETis set, validatesX-Hub-Signature-256withhmac.compare_digest(timing-safe). Ignores pushes to untracked branches (200 response, no retry). Resolves the target repo by URL-matching against indexed repos (normalises.gitsuffix and case).Background poll watcher
Started via FastAPI lifespan on startup (cancelled cleanly on shutdown). At each
POLL_INTERVALtick, fetches all indexed repos, compares stored commit SHA againstorigin/<TRACKED_BRANCH>, and callsincremental_updateif behind. Handles short vs. full SHA comparison correctly (prefix match only when lengths differ).Configuration — new env vars (documented in
.env.template)WEBHOOK_SECRETTRACKED_BRANCHmainPOLL_INTERVAL600disables the watcherOriginal prompt
💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.
Summary by CodeRabbit
Release Notes
New Features
Documentation