Skip to content

fix(node): enforce path-scoped visibility on the REST read API#52

Open
beardthelion wants to merge 3 commits into
Gitlawb:mainfrom
beardthelion:fix/rest-visibility-gate
Open

fix(node): enforce path-scoped visibility on the REST read API#52
beardthelion wants to merge 3 commits into
Gitlawb:mainfrom
beardthelion:fix/rest-visibility-gate

Conversation

@beardthelion

@beardthelion beardthelion commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

The git smart-HTTP and replication paths enforce path-scoped visibility, but the REST read endpoints never did. They look up the repo and read straight from disk with no visibility check, so a private repository, and a private subtree of a public repository, are served to anyone.

The clearest case:

GET /api/v1/repos/{owner}/{repo}/blob/secret/.env   ->  200, cleartext

get_pr_diff is the same story for diffs. This predates the visibility work (get_blob goes back to the initial release), so it is a live hole on main today, not something introduced by the in-flight subtree work. It does, however, make that work moot, which is why it is worth fixing on its own.

What this does

Adds a single authorize_repo_read gate (resolve repo, load rules, visibility_check, 404 on deny) and routes every repo-scoped read handler through it: blob, tree, commits, refs, changelog, pulls, PR diff, and repo metadata. The 404 matches what the git path returns on a denied read, so existence is not confirmed either.

  • get_blob gates at the requested file path, so a withheld subtree file and any file in a fully-private repo both return 404, while the owner and listed readers still get them.
  • get_pr_diff withholds the whole diff if it touches any path the caller cannot read, via a new store::branch_diff_names. Whole-diff denial rather than per-hunk redaction keeps it simple and safe; finer redaction can come later if wanted.
  • Listings (tree, commits, refs) are gated at the repo root only, which keeps them consistent with mode B: the git protocol already exposes tree and commit SHAs, only blob content is withheld.
  • The read routes get the optional_signature layer so an authorized reader can authenticate over REST, the same way the git read routes already do. Anonymous reads of public repos are unaffected.

Not covered here

list_repos / list_federated_repos enumerate repos and need per-row filtering, a different mechanism. They leak private-repo existence and should be a separate change.

Testing

cargo build and cargo test -p gitlawb-node --bin gitlawb-node pass. store::branch_diff_names has a unit test; the gate decision itself is covered by the existing visibility unit tests. There is no HTTP-level test harness in this crate, so endpoint-level integration tests would be a follow-up.

Closes #51

Summary by CodeRabbit

  • New Features

    • Optional HTTP-signature authentication for public repository read endpoints.
    • Branch-diff path detection to determine which file changes are visible.
  • Security

    • Repository read operations now enforce per-path visibility before returning data.
    • Pull request diffs and changelogs hide changes that include inaccessible paths; forking checks source repo readability.
  • Tests

    • Added unit test for branch-diff path detection.

The repo-scoped REST read endpoints (blob, tree, commits, refs, changelog,
pulls, pr diff, repo metadata) read straight from disk after a bare repo
lookup, with no visibility check, so private repositories and private
subtrees of public repositories were served to unauthenticated callers,
bypassing the withholding the git and replication paths enforce.

Add a shared authorize_repo_read gate (resolve repo, load rules,
visibility_check, 404 on deny) and call it from every repo-scoped read
handler. get_blob gates at the requested file path; get_pr_diff withholds
the whole diff when it touches a path the caller cannot read. Attach
optional_signature to the read routes so authorized readers authenticate.
Listings stay gated at the repo root, consistent with mode B exposing tree
and commit SHAs over git.

Closes Gitlawb#51
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 03569094-bd6a-4e8f-b507-b8bed9441940

📥 Commits

Reviewing files that changed from the base of the PR and between f3c5d33 and af8408f.

📒 Files selected for processing (1)
  • crates/gitlawb-node/src/git/store.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/gitlawb-node/src/git/store.rs

📝 Walkthrough

Walkthrough

Adds repository- and path-scoped read authorization across REST read endpoints: introduces authorize_repo_read, applies optional-signature middleware to read routes, wires optional-auth through handlers, enforces repo-level checks for metadata/blob/PR/changelog endpoints, and gates PR diffs per changed file path.

Changes

REST read authorization enforcement

Layer / File(s) Summary
Middleware and authorization helper foundation
crates/gitlawb-node/src/server.rs, crates/gitlawb-node/src/api/mod.rs
Router applies optional-signature middleware to read routes; new authorize_repo_read(state, owner, repo, caller, path) loads repo and visibility rules, runs visibility_check, and returns RepoNotFound when denied.
Repository metadata endpoints authorization
crates/gitlawb-node/src/api/repos.rs
get_repo, list_commits, get_tree_root, get_tree, list_refs, and fork_repo now call authorize_repo_read (path "/") to enforce repo-level read visibility before returning metadata or allowing forks.
Blob endpoint with path-scoped authorization
crates/gitlawb-node/src/api/repos.rs
get_blob computes a gate_path from the requested file path and calls authorize_repo_read with that path to enforce subtree-level visibility before serving file bytes.
Pull request endpoints with diff visibility
crates/gitlawb-node/src/api/pulls.rs
list_prs, get_pr, list_reviews, list_comments use optional auth and authorize_repo_read for repo-level checks. get_pr_diff computes touched branch diff paths and runs visibility_check per-path, returning NotFound if any path is denied.
Changelog endpoint authorization
crates/gitlawb-node/src/api/changelog.rs
get_changelog accepts optional authenticated DID, derives caller, and uses authorize_repo_read to enforce repo-level read authorization before building the changelog response.
Git diff paths utility
crates/gitlawb-node/src/git/store.rs
Adds branch_diff_names(repo_path, target_branch, source_branch) which shells out to git diff --name-only target...source and returns changed repo-relative paths; includes a unit test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

  • #51: Matches the PR's objective to close the REST read API bypass by adding repo-level and path-level visibility checks across REST handlers.
  • #18: Similar visibility enforcement work; this PR extends path-scoped visibility wiring which aligns with objectives described in issue #18.

Possibly related PRs

  • Gitlawb/node#25: Introduced the visibility rules and visibility_check patterns that this PR reuses and extends across REST endpoints.

Suggested reviewers

  • kevincodex1

Poem

🐰 I hopped through routes both wide and narrow,
I sniffed for secrets hidden in a sparrow.
A helper stands guard and checks each trail,
Quiet bytes tucked safe beyond the veil.
🥕✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: enforcing path-scoped visibility on REST read API endpoints, which matches the core objective of the PR.
Linked Issues check ✅ Passed The PR comprehensively addresses all primary objectives from issue #51: repo-level read gating for all read endpoints, path-level gating for blob/diff endpoints, optional_signature middleware for authenticated access, and correct 404 behavior on denial.
Out of Scope Changes check ✅ Passed All changes directly implement the visibility enforcement requirements specified in issue #51. The new authorize_repo_read function, path-specific gating logic, branch_diff_names helper, and optional_signature middleware are all within scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/gitlawb-node/src/api/repos.rs (1)

896-906: ⚠️ Potential issue | 🔴 Critical

Enforce source-repo read visibility in fork_repo

fork_repo authenticates the caller, but it never calls authorize_repo_read/visibility_check for the source repo before RepoStore::acquire(...) and git clone --mirror. Since RepoStore::acquire only ensures the repo is present locally (and can download from Tigris) without visibility checks, an authenticated user can fork/clone a private repo they don’t have read access to. Add crate::api::authorize_repo_read(&state, &owner, &name, Some(&auth.0), "/").await? (and return the opaque RepoNotFound behavior on deny) before acquiring/cloning.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/repos.rs` around lines 896 - 906, In fork_repo,
before calling RepoStore::acquire or doing the git clone, call
crate::api::authorize_repo_read(&state, &owner, &name, Some(&auth.0),
"/").await? to enforce read visibility for the source repo; place this
immediately after resolving source via state.db.get_repo(...) and before any
RepoStore::acquire or clone logic so denied access returns the same
AppError::RepoNotFound behavior, ensuring only authorized readers can fork/clone
private repos.
crates/gitlawb-node/src/api/pulls.rs (1)

340-358: ⚠️ Potential issue | 🟠 Major

Gate repo visibility in list_reviews / list_comments to prevent private repo data leakage

GET /api/v1/repos/:owner/:repo/pulls/:number/reviews and /comments are routed under read_routes with auth::optional_signature, but the handlers don’t accept/use AuthenticatedDid and don’t call crate::api::authorize_repo_read. Since the DB queries for PRs/reviews/comments don’t perform visibility checks either, an unauthenticated caller can enumerate {owner}/{repo}/{number} and read private repo review/comment bodies.

Apply the same authorization pattern as list_prs/get_pr (add auth: Option<Extension<AuthenticatedDid>> and call authorize_repo_read before querying PR/reviews/comments; pass the appropriate gate path for these endpoints).

🔒 Suggested fix for list_reviews
 pub async fn list_reviews(
     State(state): State<AppState>,
     Path((owner, name, number)): Path<(String, String, i64)>,
+    auth: Option<Extension<AuthenticatedDid>>,
 ) -> Result<Json<serde_json::Value>> {
-    let record = state
-        .db
-        .get_repo(&owner, &name)
-        .await?
-        .ok_or_else(|| AppError::RepoNotFound(format!("{owner}/{name}")))?;
+    let caller = auth.as_ref().map(|e| e.0 .0.as_str());
+    let (record, _rules) =
+        crate::api::authorize_repo_read(&state, &owner, &name, caller, "/").await?;

     let pr = state
         .db
         .get_pr(&record.id, number)
         .await?
         .ok_or_else(|| AppError::NotFound(format!("PR #{number} not found")))?;

     let reviews = state.db.list_pr_reviews(&pr.id).await?;
     Ok(Json(serde_json::json!({ "reviews": reviews })))
 }
🔒 Suggested fix for list_comments
 pub async fn list_comments(
     State(state): State<AppState>,
     Path((owner, name, number)): Path<(String, String, i64)>,
+    auth: Option<Extension<AuthenticatedDid>>,
 ) -> Result<Json<serde_json::Value>> {
-    let record = state
-        .db
-        .get_repo(&owner, &name)
-        .await?
-        .ok_or_else(|| AppError::RepoNotFound(format!("{owner}/{name}")))?;
+    let caller = auth.as_ref().map(|e| e.0 .0.as_str());
+    let (record, _rules) =
+        crate::api::authorize_repo_read(&state, &owner, &name, caller, "/").await?;

     let pr = state
         .db
         .get_pr(&record.id, number)
         .await?
         .ok_or_else(|| AppError::NotFound(format!("PR #{number} not found")))?;

     let comments = state.db.list_pr_comments(&pr.id).await?;
     Ok(Json(serde_json::json!({ "comments": comments })))
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/api/pulls.rs` around lines 340 - 358, list_reviews
and list_comments lack repo visibility checks and must accept the optional auth
extension and call authorize_repo_read before DB access: add a new parameter
auth: Option<Extension<AuthenticatedDid>> to each handler signature, then before
calling state.db.get_pr / state.db.list_pr_reviews or list_pr_comments invoke
crate::api::authorize_repo_read(&state, auth.as_ref().map(|e| &**e), &record,
"<appropriate_gate_path>") following the same pattern used in list_prs/get_pr
(use the same gate-path constant used for pull read endpoints), and only then
perform the DB queries; this ensures unauthenticated callers cannot read private
PR reviews/comments.
🧹 Nitpick comments (1)
crates/gitlawb-node/src/git/store.rs (1)

317-339: 💤 Low value

Consider checking output.status.success() for consistency and explicit error handling.

Unlike list_refs, head_commit, and ls_tree which check the exit status and report failures, this function silently returns an empty Vec if the git command fails (e.g., invalid branch refs). While the caller's subsequent branch_diff call would also fail with the same inputs—mitigating practical impact—explicit status checking would provide clearer diagnostics and align with the pattern used elsewhere in this file.

♻️ Suggested fix for exit status check
 pub fn branch_diff_names(
     repo_path: &Path,
     target_branch: &str,
     source_branch: &str,
 ) -> Result<Vec<String>> {
     let output = Command::new("git")
         .args([
             "diff",
             "--name-only",
             &format!("{target_branch}...{source_branch}"),
         ])
         .current_dir(repo_path)
         .output()
         .context("failed to run git diff --name-only")?;
+
+    if !output.status.success() {
+        let stderr = String::from_utf8_lossy(&output.stderr);
+        bail!("git diff --name-only failed: {stderr}");
+    }
+
     Ok(String::from_utf8_lossy(&output.stdout)
         .lines()
         .filter(|l| !l.is_empty())
         .map(|l| l.to_string())
         .collect())
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/git/store.rs` around lines 317 - 339, The
branch_diff_names function currently ignores the git exit status and can
silently return an empty Vec on failure; update branch_diff_names to check
output.status.success() after running the Command (using the existing output
variable) and return a contextual error (e.g., via .context or anyhow::bail)
including the git exit code and stderr if the command failed, so failures for
the given repo_path, target_branch, and source_branch are reported consistently
with list_refs/head_commit/ls_tree.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/gitlawb-node/src/api/changelog.rs`:
- Around line 7-8: Reorder the crate imports so they are alphabetized: move the
crate::auth::AuthenticatedDid import above crate::error::{AppError, Result}
(i.e., change the import block that currently contains crate::error::{AppError,
Result} and crate::auth::AuthenticatedDid so that crate::auth comes first) to
satisfy cargo fmt/CI.

---

Outside diff comments:
In `@crates/gitlawb-node/src/api/pulls.rs`:
- Around line 340-358: list_reviews and list_comments lack repo visibility
checks and must accept the optional auth extension and call authorize_repo_read
before DB access: add a new parameter auth: Option<Extension<AuthenticatedDid>>
to each handler signature, then before calling state.db.get_pr /
state.db.list_pr_reviews or list_pr_comments invoke
crate::api::authorize_repo_read(&state, auth.as_ref().map(|e| &**e), &record,
"<appropriate_gate_path>") following the same pattern used in list_prs/get_pr
(use the same gate-path constant used for pull read endpoints), and only then
perform the DB queries; this ensures unauthenticated callers cannot read private
PR reviews/comments.

In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 896-906: In fork_repo, before calling RepoStore::acquire or doing
the git clone, call crate::api::authorize_repo_read(&state, &owner, &name,
Some(&auth.0), "/").await? to enforce read visibility for the source repo; place
this immediately after resolving source via state.db.get_repo(...) and before
any RepoStore::acquire or clone logic so denied access returns the same
AppError::RepoNotFound behavior, ensuring only authorized readers can fork/clone
private repos.

---

Nitpick comments:
In `@crates/gitlawb-node/src/git/store.rs`:
- Around line 317-339: The branch_diff_names function currently ignores the git
exit status and can silently return an empty Vec on failure; update
branch_diff_names to check output.status.success() after running the Command
(using the existing output variable) and return a contextual error (e.g., via
.context or anyhow::bail) including the git exit code and stderr if the command
failed, so failures for the given repo_path, target_branch, and source_branch
are reported consistently with list_refs/head_commit/ls_tree.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b43a4f35-e0ba-4ea5-8b31-7b2bb7f056d7

📥 Commits

Reviewing files that changed from the base of the PR and between a2217bf and 86b6418.

📒 Files selected for processing (6)
  • crates/gitlawb-node/src/api/changelog.rs
  • crates/gitlawb-node/src/api/mod.rs
  • crates/gitlawb-node/src/api/pulls.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/store.rs
  • crates/gitlawb-node/src/server.rs

Comment thread crates/gitlawb-node/src/api/changelog.rs Outdated
The REST visibility gate added in this PR missed three read surfaces:
list_reviews and list_comments served a private repo's PR reviews and
comments to unauthenticated callers, and fork_repo cloned a full mirror of
the source without checking the caller could read it. Route all three through
authorize_repo_read at the repo root, matching the git-protocol gate.

Also make branch_diff_names fail closed: it ignored git's exit status and
returned an empty path list on error, which would let get_pr_diff serve a diff
it should have withheld. Fix changelog.rs import order to satisfy cargo fmt.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/git/store.rs (1)

326-342: ⚠️ Potential issue | 🔴 Critical

Use NUL-delimited git diff names for branch_diff_names path-gate enforcement

branch_diff_names parses git diff --name-only output with .lines() (repo-relative touched paths). For filenames containing newlines, git diff --name-only emits quoted/escaped text (e.g. "secret/line\\nbreak.txt"), which will not match the real repo path strings expected by get_pr_diff’s gate = format!("/{}", p.trim_start_matches('/')) against visibility_check globs like /secret/**. In a public repo where / is allowed but /secret/** is denied, this mismatch can cause the deny check to miss restricted touched paths and disclose the whole PR diff.

Switch to git diff --name-only -z and split on NUL to preserve exact path bytes.

Suggested patch
 pub fn branch_diff_names(
     repo_path: &Path,
     target_branch: &str,
     source_branch: &str,
 ) -> Result<Vec<String>> {
     let output = Command::new("git")
         .args([
             "diff",
             "--name-only",
+            "-z",
             &format!("{target_branch}...{source_branch}"),
         ])
         .current_dir(repo_path)
         .output()
         .context("failed to run git diff --name-only")?;
     if !output.status.success() {
         let stderr = String::from_utf8_lossy(&output.stderr);
         bail!("git diff --name-only failed: {stderr}");
     }
-    Ok(String::from_utf8_lossy(&output.stdout)
-        .lines()
-        .filter(|l| !l.is_empty())
-        .map(|l| l.to_string())
-        .collect())
+    Ok(output
+        .stdout
+        .split(|b| *b == b'\0')
+        .filter(|s| !s.is_empty())
+        .map(|s| String::from_utf8_lossy(s).into_owned())
+        .collect())
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/gitlawb-node/src/git/store.rs` around lines 326 - 342, The
branch_diff_names implementation currently parses git diff output with .lines()
which misparses filenames containing newlines; update branch_diff_names to call
git with args
["diff","--name-only","-z",&format!("{target_branch}...{source_branch}")] and
then split the output on '\0' (e.g.,
String::from_utf8_lossy(&output.stdout).split('\0')) filtering out empty entries
and collecting to Vec<String>, leaving the existing error handling for
output.status in place; this preserves exact repo-relative path bytes so the
gate construction in get_pr_diff (gate = format!("/{}",
p.trim_start_matches('/'))) and visibility_check globs (e.g., /secret/**) will
match correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/gitlawb-node/src/git/store.rs`:
- Around line 326-342: The branch_diff_names implementation currently parses git
diff output with .lines() which misparses filenames containing newlines; update
branch_diff_names to call git with args
["diff","--name-only","-z",&format!("{target_branch}...{source_branch}")] and
then split the output on '\0' (e.g.,
String::from_utf8_lossy(&output.stdout).split('\0')) filtering out empty entries
and collecting to Vec<String>, leaving the existing error handling for
output.status in place; this preserves exact repo-relative path bytes so the
gate construction in get_pr_diff (gate = format!("/{}",
p.trim_start_matches('/'))) and visibility_check globs (e.g., /secret/**) will
match correctly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 3b10a787-781a-4f6a-82e0-c28be4ddaee8

📥 Commits

Reviewing files that changed from the base of the PR and between 86b6418 and f3c5d33.

📒 Files selected for processing (4)
  • crates/gitlawb-node/src/api/changelog.rs
  • crates/gitlawb-node/src/api/pulls.rs
  • crates/gitlawb-node/src/api/repos.rs
  • crates/gitlawb-node/src/git/store.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/gitlawb-node/src/api/changelog.rs
  • crates/gitlawb-node/src/api/pulls.rs

…t exact

git diff --name-only quotes and escapes paths containing newlines, so they
no longer match the visibility globs in get_pr_diff. On a public repo where /
is allowed but a subtree is denied, a touched restricted path could be missed
by the deny check and the full PR diff disclosed. Use -z and split on NUL to
preserve exact path bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

REST read API bypasses path-scoped visibility: private repo/subtree content served unauthenticated

1 participant