fix(node): enforce path-scoped visibility on the REST read API#52
fix(node): enforce path-scoped visibility on the REST read API#52beardthelion wants to merge 3 commits into
Conversation
The repo-scoped REST read endpoints (blob, tree, commits, refs, changelog, pulls, pr diff, repo metadata) read straight from disk after a bare repo lookup, with no visibility check, so private repositories and private subtrees of public repositories were served to unauthenticated callers, bypassing the withholding the git and replication paths enforce. Add a shared authorize_repo_read gate (resolve repo, load rules, visibility_check, 404 on deny) and call it from every repo-scoped read handler. get_blob gates at the requested file path; get_pr_diff withholds the whole diff when it touches a path the caller cannot read. Attach optional_signature to the read routes so authorized readers authenticate. Listings stay gated at the repo root, consistent with mode B exposing tree and commit SHAs over git. Closes Gitlawb#51
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds repository- and path-scoped read authorization across REST read endpoints: introduces authorize_repo_read, applies optional-signature middleware to read routes, wires optional-auth through handlers, enforces repo-level checks for metadata/blob/PR/changelog endpoints, and gates PR diffs per changed file path. ChangesREST read authorization enforcement
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/gitlawb-node/src/api/repos.rs (1)
896-906:⚠️ Potential issue | 🔴 CriticalEnforce source-repo read visibility in
fork_repo
fork_repoauthenticates the caller, but it never callsauthorize_repo_read/visibility_checkfor the source repo beforeRepoStore::acquire(...)andgit clone --mirror. SinceRepoStore::acquireonly ensures the repo is present locally (and can download from Tigris) without visibility checks, an authenticated user can fork/clone a private repo they don’t have read access to. Addcrate::api::authorize_repo_read(&state, &owner, &name, Some(&auth.0), "/").await?(and return the opaqueRepoNotFoundbehavior on deny) before acquiring/cloning.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/repos.rs` around lines 896 - 906, In fork_repo, before calling RepoStore::acquire or doing the git clone, call crate::api::authorize_repo_read(&state, &owner, &name, Some(&auth.0), "/").await? to enforce read visibility for the source repo; place this immediately after resolving source via state.db.get_repo(...) and before any RepoStore::acquire or clone logic so denied access returns the same AppError::RepoNotFound behavior, ensuring only authorized readers can fork/clone private repos.crates/gitlawb-node/src/api/pulls.rs (1)
340-358:⚠️ Potential issue | 🟠 MajorGate repo visibility in
list_reviews/list_commentsto prevent private repo data leakage
GET /api/v1/repos/:owner/:repo/pulls/:number/reviewsand/commentsare routed underread_routeswithauth::optional_signature, but the handlers don’t accept/useAuthenticatedDidand don’t callcrate::api::authorize_repo_read. Since the DB queries for PRs/reviews/comments don’t perform visibility checks either, an unauthenticated caller can enumerate{owner}/{repo}/{number}and read private repo review/comment bodies.Apply the same authorization pattern as
list_prs/get_pr(addauth: Option<Extension<AuthenticatedDid>>and callauthorize_repo_readbefore querying PR/reviews/comments; pass the appropriate gate path for these endpoints).🔒 Suggested fix for list_reviews
pub async fn list_reviews( State(state): State<AppState>, Path((owner, name, number)): Path<(String, String, i64)>, + auth: Option<Extension<AuthenticatedDid>>, ) -> Result<Json<serde_json::Value>> { - let record = state - .db - .get_repo(&owner, &name) - .await? - .ok_or_else(|| AppError::RepoNotFound(format!("{owner}/{name}")))?; + let caller = auth.as_ref().map(|e| e.0 .0.as_str()); + let (record, _rules) = + crate::api::authorize_repo_read(&state, &owner, &name, caller, "/").await?; let pr = state .db .get_pr(&record.id, number) .await? .ok_or_else(|| AppError::NotFound(format!("PR #{number} not found")))?; let reviews = state.db.list_pr_reviews(&pr.id).await?; Ok(Json(serde_json::json!({ "reviews": reviews }))) }🔒 Suggested fix for list_comments
pub async fn list_comments( State(state): State<AppState>, Path((owner, name, number)): Path<(String, String, i64)>, + auth: Option<Extension<AuthenticatedDid>>, ) -> Result<Json<serde_json::Value>> { - let record = state - .db - .get_repo(&owner, &name) - .await? - .ok_or_else(|| AppError::RepoNotFound(format!("{owner}/{name}")))?; + let caller = auth.as_ref().map(|e| e.0 .0.as_str()); + let (record, _rules) = + crate::api::authorize_repo_read(&state, &owner, &name, caller, "/").await?; let pr = state .db .get_pr(&record.id, number) .await? .ok_or_else(|| AppError::NotFound(format!("PR #{number} not found")))?; let comments = state.db.list_pr_comments(&pr.id).await?; Ok(Json(serde_json::json!({ "comments": comments }))) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/api/pulls.rs` around lines 340 - 358, list_reviews and list_comments lack repo visibility checks and must accept the optional auth extension and call authorize_repo_read before DB access: add a new parameter auth: Option<Extension<AuthenticatedDid>> to each handler signature, then before calling state.db.get_pr / state.db.list_pr_reviews or list_pr_comments invoke crate::api::authorize_repo_read(&state, auth.as_ref().map(|e| &**e), &record, "<appropriate_gate_path>") following the same pattern used in list_prs/get_pr (use the same gate-path constant used for pull read endpoints), and only then perform the DB queries; this ensures unauthenticated callers cannot read private PR reviews/comments.
🧹 Nitpick comments (1)
crates/gitlawb-node/src/git/store.rs (1)
317-339: 💤 Low valueConsider checking
output.status.success()for consistency and explicit error handling.Unlike
list_refs,head_commit, andls_treewhich check the exit status and report failures, this function silently returns an emptyVecif the git command fails (e.g., invalid branch refs). While the caller's subsequentbranch_diffcall would also fail with the same inputs—mitigating practical impact—explicit status checking would provide clearer diagnostics and align with the pattern used elsewhere in this file.♻️ Suggested fix for exit status check
pub fn branch_diff_names( repo_path: &Path, target_branch: &str, source_branch: &str, ) -> Result<Vec<String>> { let output = Command::new("git") .args([ "diff", "--name-only", &format!("{target_branch}...{source_branch}"), ]) .current_dir(repo_path) .output() .context("failed to run git diff --name-only")?; + + if !output.status.success() { + let stderr = String::from_utf8_lossy(&output.stderr); + bail!("git diff --name-only failed: {stderr}"); + } + Ok(String::from_utf8_lossy(&output.stdout) .lines() .filter(|l| !l.is_empty()) .map(|l| l.to_string()) .collect()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/git/store.rs` around lines 317 - 339, The branch_diff_names function currently ignores the git exit status and can silently return an empty Vec on failure; update branch_diff_names to check output.status.success() after running the Command (using the existing output variable) and return a contextual error (e.g., via .context or anyhow::bail) including the git exit code and stderr if the command failed, so failures for the given repo_path, target_branch, and source_branch are reported consistently with list_refs/head_commit/ls_tree.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/gitlawb-node/src/api/changelog.rs`:
- Around line 7-8: Reorder the crate imports so they are alphabetized: move the
crate::auth::AuthenticatedDid import above crate::error::{AppError, Result}
(i.e., change the import block that currently contains crate::error::{AppError,
Result} and crate::auth::AuthenticatedDid so that crate::auth comes first) to
satisfy cargo fmt/CI.
---
Outside diff comments:
In `@crates/gitlawb-node/src/api/pulls.rs`:
- Around line 340-358: list_reviews and list_comments lack repo visibility
checks and must accept the optional auth extension and call authorize_repo_read
before DB access: add a new parameter auth: Option<Extension<AuthenticatedDid>>
to each handler signature, then before calling state.db.get_pr /
state.db.list_pr_reviews or list_pr_comments invoke
crate::api::authorize_repo_read(&state, auth.as_ref().map(|e| &**e), &record,
"<appropriate_gate_path>") following the same pattern used in list_prs/get_pr
(use the same gate-path constant used for pull read endpoints), and only then
perform the DB queries; this ensures unauthenticated callers cannot read private
PR reviews/comments.
In `@crates/gitlawb-node/src/api/repos.rs`:
- Around line 896-906: In fork_repo, before calling RepoStore::acquire or doing
the git clone, call crate::api::authorize_repo_read(&state, &owner, &name,
Some(&auth.0), "/").await? to enforce read visibility for the source repo; place
this immediately after resolving source via state.db.get_repo(...) and before
any RepoStore::acquire or clone logic so denied access returns the same
AppError::RepoNotFound behavior, ensuring only authorized readers can fork/clone
private repos.
---
Nitpick comments:
In `@crates/gitlawb-node/src/git/store.rs`:
- Around line 317-339: The branch_diff_names function currently ignores the git
exit status and can silently return an empty Vec on failure; update
branch_diff_names to check output.status.success() after running the Command
(using the existing output variable) and return a contextual error (e.g., via
.context or anyhow::bail) including the git exit code and stderr if the command
failed, so failures for the given repo_path, target_branch, and source_branch
are reported consistently with list_refs/head_commit/ls_tree.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b43a4f35-e0ba-4ea5-8b31-7b2bb7f056d7
📒 Files selected for processing (6)
crates/gitlawb-node/src/api/changelog.rscrates/gitlawb-node/src/api/mod.rscrates/gitlawb-node/src/api/pulls.rscrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/git/store.rscrates/gitlawb-node/src/server.rs
The REST visibility gate added in this PR missed three read surfaces: list_reviews and list_comments served a private repo's PR reviews and comments to unauthenticated callers, and fork_repo cloned a full mirror of the source without checking the caller could read it. Route all three through authorize_repo_read at the repo root, matching the git-protocol gate. Also make branch_diff_names fail closed: it ignored git's exit status and returned an empty path list on error, which would let get_pr_diff serve a diff it should have withheld. Fix changelog.rs import order to satisfy cargo fmt.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/gitlawb-node/src/git/store.rs (1)
326-342:⚠️ Potential issue | 🔴 CriticalUse NUL-delimited
git diffnames forbranch_diff_namespath-gate enforcement
branch_diff_namesparsesgit diff --name-onlyoutput with.lines()(repo-relative touched paths). For filenames containing newlines,git diff --name-onlyemits quoted/escaped text (e.g."secret/line\\nbreak.txt"), which will not match the real repo path strings expected byget_pr_diff’sgate = format!("/{}", p.trim_start_matches('/'))againstvisibility_checkglobs like/secret/**. In a public repo where/is allowed but/secret/**is denied, this mismatch can cause the deny check to miss restricted touched paths and disclose the whole PR diff.Switch to
git diff --name-only -zand split on NUL to preserve exact path bytes.Suggested patch
pub fn branch_diff_names( repo_path: &Path, target_branch: &str, source_branch: &str, ) -> Result<Vec<String>> { let output = Command::new("git") .args([ "diff", "--name-only", + "-z", &format!("{target_branch}...{source_branch}"), ]) .current_dir(repo_path) .output() .context("failed to run git diff --name-only")?; if !output.status.success() { let stderr = String::from_utf8_lossy(&output.stderr); bail!("git diff --name-only failed: {stderr}"); } - Ok(String::from_utf8_lossy(&output.stdout) - .lines() - .filter(|l| !l.is_empty()) - .map(|l| l.to_string()) - .collect()) + Ok(output + .stdout + .split(|b| *b == b'\0') + .filter(|s| !s.is_empty()) + .map(|s| String::from_utf8_lossy(s).into_owned()) + .collect()) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/gitlawb-node/src/git/store.rs` around lines 326 - 342, The branch_diff_names implementation currently parses git diff output with .lines() which misparses filenames containing newlines; update branch_diff_names to call git with args ["diff","--name-only","-z",&format!("{target_branch}...{source_branch}")] and then split the output on '\0' (e.g., String::from_utf8_lossy(&output.stdout).split('\0')) filtering out empty entries and collecting to Vec<String>, leaving the existing error handling for output.status in place; this preserves exact repo-relative path bytes so the gate construction in get_pr_diff (gate = format!("/{}", p.trim_start_matches('/'))) and visibility_check globs (e.g., /secret/**) will match correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/gitlawb-node/src/git/store.rs`:
- Around line 326-342: The branch_diff_names implementation currently parses git
diff output with .lines() which misparses filenames containing newlines; update
branch_diff_names to call git with args
["diff","--name-only","-z",&format!("{target_branch}...{source_branch}")] and
then split the output on '\0' (e.g.,
String::from_utf8_lossy(&output.stdout).split('\0')) filtering out empty entries
and collecting to Vec<String>, leaving the existing error handling for
output.status in place; this preserves exact repo-relative path bytes so the
gate construction in get_pr_diff (gate = format!("/{}",
p.trim_start_matches('/'))) and visibility_check globs (e.g., /secret/**) will
match correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 3b10a787-781a-4f6a-82e0-c28be4ddaee8
📒 Files selected for processing (4)
crates/gitlawb-node/src/api/changelog.rscrates/gitlawb-node/src/api/pulls.rscrates/gitlawb-node/src/api/repos.rscrates/gitlawb-node/src/git/store.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/gitlawb-node/src/api/changelog.rs
- crates/gitlawb-node/src/api/pulls.rs
…t exact git diff --name-only quotes and escapes paths containing newlines, so they no longer match the visibility globs in get_pr_diff. On a public repo where / is allowed but a subtree is denied, a touched restricted path could be missed by the deny check and the full PR diff disclosed. Use -z and split on NUL to preserve exact path bytes.
The git smart-HTTP and replication paths enforce path-scoped visibility, but the REST read endpoints never did. They look up the repo and read straight from disk with no visibility check, so a private repository, and a private subtree of a public repository, are served to anyone.
The clearest case:
get_pr_diffis the same story for diffs. This predates the visibility work (get_blobgoes back to the initial release), so it is a live hole onmaintoday, not something introduced by the in-flight subtree work. It does, however, make that work moot, which is why it is worth fixing on its own.What this does
Adds a single
authorize_repo_readgate (resolve repo, load rules,visibility_check, 404 on deny) and routes every repo-scoped read handler through it: blob, tree, commits, refs, changelog, pulls, PR diff, and repo metadata. The 404 matches what the git path returns on a denied read, so existence is not confirmed either.get_blobgates at the requested file path, so a withheld subtree file and any file in a fully-private repo both return 404, while the owner and listed readers still get them.get_pr_diffwithholds the whole diff if it touches any path the caller cannot read, via a newstore::branch_diff_names. Whole-diff denial rather than per-hunk redaction keeps it simple and safe; finer redaction can come later if wanted.optional_signaturelayer so an authorized reader can authenticate over REST, the same way the git read routes already do. Anonymous reads of public repos are unaffected.Not covered here
list_repos/list_federated_reposenumerate repos and need per-row filtering, a different mechanism. They leak private-repo existence and should be a separate change.Testing
cargo buildandcargo test -p gitlawb-node --bin gitlawb-nodepass.store::branch_diff_nameshas a unit test; the gate decision itself is covered by the existingvisibilityunit tests. There is no HTTP-level test harness in this crate, so endpoint-level integration tests would be a follow-up.Closes #51
Summary by CodeRabbit
New Features
Security
Tests