Skip to content

Feature/delete account endpoint#265

Open
Jayant-kernel wants to merge 14 commits intoopenml:mainfrom
Jayant-kernel:feature/delete-account-endpoint
Open

Feature/delete account endpoint#265
Jayant-kernel wants to merge 14 commits intoopenml:mainfrom
Jayant-kernel:feature/delete-account-endpoint

Conversation

@Jayant-kernel
Copy link

Closes #194.

This PR implements Phase 1 of the request to allow account deletion via the API.

Changes

  • New Endpoint: DELETE /v1/users/{user_id}
  • Database Layer: Added delete_user() (removes from users, users_groups) and get_user_resource_count() to measure dataset, flow, and run ownership
  • Authorization: Only the account owner or an ADMIN can delete the account
  • Resource Protection: If the user has any resources (datasets, flows, runs), the deletion is blocked and a 409 Conflict is returned
  • Errors: Added UserError enum to core/errors.py
  • Tests: Added full coverage for self-delete, admin-delete, unauthenticated, non-owner, and non-existent user scenarios

Phase 2 (anonymization/archiving resources) is left for a future issue to keep this PR small and focused.

Jayant Kernel and others added 5 commits March 7, 2026 22:57
… flow names

The GET /flows/exists/{name}/{external_version} endpoint broke for flows
with URI-unsafe characters in their names (e.g. sklearn flows with
parentheses). Replaced with POST /flows/exists accepting name and
external_version in the request body, resolving issue openml#166.

Closes openml#166
…cated wrapper

- Move FlowExistsBody to schemas/flows.py for reusability
- Keep GET /flows/exists/{name}/{version} as a deprecated thin wrapper
  around POST /flows/exists for backward compatibility
- Update test imports accordingly
- Add UserError enum to core/errors.py

- Add get_user_resource_count() and delete_user() to database/users.py

- New src/routers/openml/users.py: DELETE /users/{user_id}

  Returns 401/403/404/409 appropriately; 200 on success

- Register users_router in main.py

- Tests: self-delete, admin-delete, no-auth, non-owner, not-found

Phase 2 (resource anonymization) left for follow-up.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

Walkthrough

Adds account-deletion support and related utilities. Introduces a new UserError enum. Adds database functions get_user_resource_count and delete_user. Adds a /users APIRouter exposing DELETE /users/{user_id} that enforces authentication and owner/admin authorization, locks the target row FOR UPDATE, invalidates the session_hash, checks resource counts (conflict if any), and deletes the user. Registers the router in the app. Refactors flow-existence to accept a FlowExistsBody via POST with a deprecated GET wrapper and adds the corresponding schema and tests.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes to flows.py (FlowExistsBody schema and POST endpoint) appear out-of-scope relative to issue #194 which focuses solely on account deletion; this modification is unrelated to user deletion. Review whether the flows.py changes belong in this PR or should be separated into a different feature branch, as they do not address the account deletion objective.
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Feature/delete account endpoint' clearly summarizes the main change—implementing a new endpoint for account deletion as the primary focus of the PR.
Description check ✅ Passed The description is directly related to the changeset, detailing the new DELETE endpoint, database functions, authorization logic, resource protection, and test coverage for account deletion.
Linked Issues check ✅ Passed The PR implements an API endpoint for account deletion (#194) with authorization checks and resource protection, meeting the primary objective of replacing the manual account deletion process.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In routers/openml/users.py, the inline import of get_user_id_for inside delete_account is unused and can be removed, and the direct SELECT for user existence could either be extracted to a helper or reuse an existing helper for consistency with other user DB access patterns.
  • The delete_user function issues two independent DELETE statements without an explicit transaction; consider wrapping these operations in a transaction (or documenting reliance on implicit transactions) to avoid leaving users_groups in an inconsistent state if the second DELETE fails.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `routers/openml/users.py`, the inline import of `get_user_id_for` inside `delete_account` is unused and can be removed, and the direct `SELECT` for user existence could either be extracted to a helper or reuse an existing helper for consistency with other user DB access patterns.
- The `delete_user` function issues two independent DELETE statements without an explicit transaction; consider wrapping these operations in a transaction (or documenting reliance on implicit transactions) to avoid leaving `users_groups` in an inconsistent state if the second DELETE fails.

## Individual Comments

### Comment 1
<location path="src/routers/openml/users.py" line_range="62-71" />
<code_context>
+            detail={"code": str(int(UserError.NOT_FOUND)), "message": "User not found"},
+        )
+
+    resource_count = get_user_resource_count(user_id=user_id, expdb=expdb)
+    if resource_count > 0:
+        raise HTTPException(
+            status_code=HTTPStatus.CONFLICT,
+            detail={
+                "code": str(int(UserError.HAS_RESOURCES)),
+                "message": (
+                    f"User has {resource_count} resource(s). "
+                    "Remove or transfer resources before deleting the account."
+                ),
+            },
+        )
+
+    delete_user(user_id=user_id, connection=user_db)
+    return {"user_id": user_id, "deleted": True}
</code_context>
<issue_to_address>
**issue (bug_risk):** There’s a race window between counting resources and deleting the user that could lead to inconsistent state.

Between `get_user_resource_count(...)` and `delete_user(...)`, new datasets/flows/runs could be created, especially when an admin is deleting another user. Because the count and delete happen in different databases, they can’t be made atomic with a single transaction, so the “no resources” invariant isn’t guaranteed. If this consistency is important, consider enforcing it at the DB layer (e.g., foreign keys or a soft-delete plus constraint), or otherwise mitigating/documenting the race (e.g., lock the account or block new uploads while deletion is in progress).
</issue_to_address>

### Comment 2
<location path="src/routers/openml/users.py" line_range="51-56" />
<code_context>
+    # Check user exists by querying for them directly
+    from sqlalchemy import text
+
+    existing = user_db.execute(
+        text("SELECT id FROM users WHERE id = :user_id"),
+        parameters={"user_id": user_id},
+    ).one_or_none()
+
+    if existing is None:
+        raise HTTPException(
+            status_code=HTTPStatus.NOT_FOUND,
</code_context>
<issue_to_address>
**suggestion:** Reuse existing user access helpers instead of inlining SQL for existence checks.

Since this module already exposes user helpers (and you import `get_user_id_for` above), prefer using a shared helper for this existence check instead of inlining raw SQL. That keeps schema knowledge centralized and avoids this query drifting out of sync if the users table or lookup logic changes.

Suggested implementation:

```python
    # Verify the target user exists using shared helper to keep schema knowledge centralized
    existing_user_id = get_user_id_for(user_id=user_id, user_db=user_db)

    if existing_user_id is None:
        raise HTTPException(

```

The exact call to `get_user_id_for` may need to be adjusted to match its actual signature and return semantics in `database.users`, for example:
- Renaming `user_db` argument to `db`, `session`, or similar if that’s what the helper expects.
- If `get_user_id_for` raises an exception (e.g. `UserNotFoundError`) instead of returning `None`, replace the `if existing_user_id is None` check with a `try/except` that maps that exception to the `HTTPException` with `NOT_FOUND`.
- If `get_user_id_for` is already imported at the module level, you can also safely remove any now-unused inline import of it elsewhere in this file.
</issue_to_address>

### Comment 3
<location path="tests/routers/openml/users_test.py" line_range="10-19" />
<code_context>
+@pytest.mark.mut
</code_context>
<issue_to_address>
**issue (testing):** Add a test case for deletion being blocked when the user owns resources (datasets/flows/runs).

Currently there’s no test covering the 409 path when `get_user_resource_count` > 0. Please add an integration-style test that creates at least one dataset/flow/run owned by the user, attempts deletion (as self and/or admin), and asserts that:
- the response status is 409
- the error code is `UserError.HAS_RESOURCES`
- the user record remains in the `users` table.

This helps prevent regressions in the endpoint’s safety guarantees.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +10 to +19
@pytest.mark.mut
def test_delete_user_self(py_api: TestClient, user_test: Connection) -> None:
"""A user without resources can delete their own account."""
# Insert a fresh disposable user
user_test.execute(
text(
"INSERT INTO users (session_hash, email, first_name, last_name, password)"
" VALUES ('aaaabbbbccccddddaaaabbbbccccdddd', 'del@test.com', 'Del', 'User', 'x')",
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Add a test case for deletion being blocked when the user owns resources (datasets/flows/runs).

Currently there’s no test covering the 409 path when get_user_resource_count > 0. Please add an integration-style test that creates at least one dataset/flow/run owned by the user, attempts deletion (as self and/or admin), and asserts that:

  • the response status is 409
  • the error code is UserError.HAS_RESOURCES
  • the user record remains in the users table.

This helps prevent regressions in the endpoint’s safety guarantees.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/routers/openml/flows_test.py (1)

65-76: Add a regression test for the deprecated GET alias.

These updates only exercise POST /flows/exists, but the PR also adds GET /flows/exists/{name}/{external_version} as a compatibility shim in src/routers/openml/flows.py:34-41. One small smoke test for the GET path would keep that backward-compatibility promise from drifting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/flows_test.py` around lines 65 - 76, Add a small
regression test that exercises the deprecated GET alias for the exists endpoint:
create a new test (e.g., test_flow_exists_get_alias) in
tests/routers/openml/flows_test.py that uses the TestClient (py_api) to call GET
"/flows/exists/{name}/{external_version}" with a real Flow fixture and asserts
HTTPStatus.OK and response.json() == {"flow_id": flow.id}; also add a
complementary GET call with non-existent name/version asserting
HTTPStatus.NOT_FOUND and detail "Flow not found." to mirror the existing POST
tests and ensure the compatibility shim implemented in the flows router is
exercised.
tests/routers/openml/users_test.py (2)

10-24: Assert the DB side effects in the happy paths.

Both success cases stop at the response payload, so a handler that returns {"deleted": true} without actually removing the user would still pass. Since this endpoint is supposed to clean up both users and users_groups, add post-delete assertions against the database state and seed at least one users_groups row for the inserted user.

Also applies to: 27-40

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/users_test.py` around lines 10 - 24, Update the
test_delete_user_self (and the similar test at lines 27-40) to assert DB
side-effects: after inserting the disposable user via user_test.execute and
performing py_api.delete, insert at least one users_groups row for that new user
before calling the API, then after the delete call query the database using
user_test (e.g., SELECT COUNT(*) FROM users WHERE id = new_id and SELECT
COUNT(*) FROM users_groups WHERE user_id = new_id) and assert both counts are
zero; keep the existing response assertions but add these post-delete assertions
so the handler actually removed the user and its users_groups rows.

10-60: Add the missing 409 coverage for users who still own resources.

The main safety check of this endpoint is the conflict path, but this file never exercises it. A regression in get_user_resource_count() or the HTTP 409 mapping would slip through here. There is already a DATASET_130_OWNER fixture in tests/users.py:1-17 that looks reusable for this scenario.

🧪 Suggested test skeleton
+def test_delete_user_has_resources(py_api: TestClient) -> None:
+    response = py_api.delete(f"/users/16?api_key={ApiKey.DATASET_130_OWNER}")
+    assert response.status_code == HTTPStatus.CONFLICT
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/users_test.py` around lines 10 - 60, Add a test that
exercises the conflict path when deleting a user who still owns resources:
create a new test (e.g. test_delete_user_with_resources) that uses the existing
DATASET_130_OWNER fixture from tests/users.py to get a user id that owns
resources, call the DELETE /users/{user_id}?api_key={ApiKey.ADMIN} (or
appropriate api key), and assert the response.status_code is HTTPStatus.CONFLICT
and response.json()["detail"]["code"] == "130" to cover
get_user_resource_count() and the 409 mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/database/users.py`:
- Around line 77-100: get_user_resource_count currently only counts dataset,
implementation, and run, but misses other user-owned references (study.creator,
task_study.uploader, run_study.uploader, dataset_tag.uploader) which can cause
FK integrity errors or orphaned rows; update get_user_resource_count to also
query and include counts for study (creator), task_study (uploader), run_study
(uploader), and dataset_tag (uploader) using the same expdb.execute(text(...),
parameters={"user_id": user_id}).scalar() pattern and sum those results before
returning, or alternatively ensure any delete flow (e.g., the user deletion
routine that calls get_user_resource_count) explicitly cleans those tables
first—refer to the function name get_user_resource_count and the table names
study, task_study, run_study, dataset_tag to locate and change the code.

In `@src/routers/openml/users.py`:
- Around line 59-72: The current flow calling get_user_resource_count(user_id,
expdb) then delete_user(user_id, connection=user_db) is racy across expdb and
user_db; change it to first mark the user as deletion-pending/disabled in the
user DB (e.g., add or call a function like mark_user_deletion_pending /
update_user_status to set a "deletion_pending" flag), then call
get_user_resource_count(user_id, expdb); if count > 0 raise the same
HTTPException and clear or keep the pending flag per policy, and only call
delete_user(user_id, connection=user_db) once the pending flag is set and the
count is zero; additionally ensure write paths that create resources consult
this deletion_pending flag and reject creations for users with that flag so no
new resources can be created between the check and delete.

---

Nitpick comments:
In `@tests/routers/openml/flows_test.py`:
- Around line 65-76: Add a small regression test that exercises the deprecated
GET alias for the exists endpoint: create a new test (e.g.,
test_flow_exists_get_alias) in tests/routers/openml/flows_test.py that uses the
TestClient (py_api) to call GET "/flows/exists/{name}/{external_version}" with a
real Flow fixture and asserts HTTPStatus.OK and response.json() == {"flow_id":
flow.id}; also add a complementary GET call with non-existent name/version
asserting HTTPStatus.NOT_FOUND and detail "Flow not found." to mirror the
existing POST tests and ensure the compatibility shim implemented in the flows
router is exercised.

In `@tests/routers/openml/users_test.py`:
- Around line 10-24: Update the test_delete_user_self (and the similar test at
lines 27-40) to assert DB side-effects: after inserting the disposable user via
user_test.execute and performing py_api.delete, insert at least one users_groups
row for that new user before calling the API, then after the delete call query
the database using user_test (e.g., SELECT COUNT(*) FROM users WHERE id = new_id
and SELECT COUNT(*) FROM users_groups WHERE user_id = new_id) and assert both
counts are zero; keep the existing response assertions but add these post-delete
assertions so the handler actually removed the user and its users_groups rows.
- Around line 10-60: Add a test that exercises the conflict path when deleting a
user who still owns resources: create a new test (e.g.
test_delete_user_with_resources) that uses the existing DATASET_130_OWNER
fixture from tests/users.py to get a user id that owns resources, call the
DELETE /users/{user_id}?api_key={ApiKey.ADMIN} (or appropriate api key), and
assert the response.status_code is HTTPStatus.CONFLICT and
response.json()["detail"]["code"] == "130" to cover get_user_resource_count()
and the 409 mapping.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 753537d2-f2cc-40a8-af41-1f4a7757f7f7

📥 Commits

Reviewing files that changed from the base of the PR and between 72989df and 1577f83.

📒 Files selected for processing (9)
  • src/core/errors.py
  • src/database/users.py
  • src/main.py
  • src/routers/openml/flows.py
  • src/routers/openml/users.py
  • src/schemas/flows.py
  • tests/routers/openml/flows_test.py
  • tests/routers/openml/migration/flows_migration_test.py
  • tests/routers/openml/users_test.py

- database/users: add study, task_study, run_study, dataset_tag checks

- database/users: wrap user & group deletion in an explicit transaction

- routers/users: reuse direct query existence check without unused imports

- tests/users_test: add coverage for 409 conflict when resources exist

- tests/users_test: add db-level side-effect assertions for successful deletion
@Jayant-kernel Jayant-kernel force-pushed the feature/delete-account-endpoint branch from 1577f83 to be4980e Compare March 7, 2026 19:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/routers/openml/users.py (1)

57-70: ⚠️ Potential issue | 🟠 Major

Resource check and delete are still racy across expdb and user_db.

This is still split across two independent connections, so a concurrent write can create a dataset/flow/run/study after Line 57 returns 0 but before Line 70 deletes the user. That leaves resources owned by a deleted account, which breaks the endpoint’s “no owned resources” guarantee. Please gate this with a user state change first (for example, deletion-pending/disabled) and make resource-creation paths reject that state before running the count.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/users.py` around lines 57 - 70, The current flow calls
get_user_resource_count(...) on expdb then calls delete_user(...) on user_db,
which is racy; fix by first atomically setting the user's state to a
deletion-pending/disabled value in the user DB (e.g. add or use an
update_user_state/mark_user_deletion_pending function) so resource-creation
paths will reject new resources for that state, then re-run
get_user_resource_count(...) against expdb and only call delete_user(...) if the
count is still zero; if count > 0, revert the state or return the appropriate
error. Ensure resource-creation code paths check the user's state and fail for
deletion-pending before creating datasets/flows/runs/studies.
🧹 Nitpick comments (1)
tests/routers/openml/users_test.py (1)

94-109: Add regression coverage for the newly counted ownership tables.

This only proves the 409 path for dataset ownership. get_user_resource_count() now also blocks deletion for study.creator, task_study.uploader, run_study.uploader, and dataset_tag.uploader, so a regression in any of those queries would still pass this suite. A small parametrized test covering one row in each table would lock that behavior down.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/users_test.py` around lines 94 - 109, Extend the
existing test_delete_user_has_resources coverage by adding a parametrized test
that calls the same DELETE endpoint for a user who is the owner/uploader in each
of the newly counted tables (study.creator, task_study.uploader,
run_study.uploader, dataset_tag.uploader, plus the existing dataset owner case),
using appropriate ApiKey values for each target row; for each case assert
response.status_code == HTTPStatus.CONFLICT, response.json()["detail"]["code"]
== "122", the message contains "resource(s)", and verify the user record still
exists via the same user_test SELECT COUNT(*) query — use the existing test name
(or a new parametrized variant) and the same get_user_resource_count-related
semantics so regressions in those ownership checks are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/routers/openml/users.py`:
- Around line 57-70: The current flow calls get_user_resource_count(...) on
expdb then calls delete_user(...) on user_db, which is racy; fix by first
atomically setting the user's state to a deletion-pending/disabled value in the
user DB (e.g. add or use an update_user_state/mark_user_deletion_pending
function) so resource-creation paths will reject new resources for that state,
then re-run get_user_resource_count(...) against expdb and only call
delete_user(...) if the count is still zero; if count > 0, revert the state or
return the appropriate error. Ensure resource-creation code paths check the
user's state and fail for deletion-pending before creating
datasets/flows/runs/studies.

---

Nitpick comments:
In `@tests/routers/openml/users_test.py`:
- Around line 94-109: Extend the existing test_delete_user_has_resources
coverage by adding a parametrized test that calls the same DELETE endpoint for a
user who is the owner/uploader in each of the newly counted tables
(study.creator, task_study.uploader, run_study.uploader, dataset_tag.uploader,
plus the existing dataset owner case), using appropriate ApiKey values for each
target row; for each case assert response.status_code == HTTPStatus.CONFLICT,
response.json()["detail"]["code"] == "122", the message contains "resource(s)",
and verify the user record still exists via the same user_test SELECT COUNT(*)
query — use the existing test name (or a new parametrized variant) and the same
get_user_resource_count-related semantics so regressions in those ownership
checks are caught.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7d7c999b-e729-49a8-ade8-af6403af961f

📥 Commits

Reviewing files that changed from the base of the PR and between 1577f83 and 6449a2e.

📒 Files selected for processing (3)
  • src/database/users.py
  • src/routers/openml/users.py
  • tests/routers/openml/users_test.py

Jayant-kernel and others added 6 commits March 8, 2026 01:14
- routers/users: invalidate session_hash to lock user state and prevent concurrent resource creation during deletion checks

- tests/users_test: add parametrized test asserting resource ownership checks for datasets, runs, implementations, studies, and dataset_tags

- tests/flows_test: add regression test for the deprecated GET flows/exists endpoint
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/routers/openml/users_test.py (2)

106-147: Use table_name / column_name or remove them from the parametrization.

Only insert_sql affects this test today, so the extra tuple members just add lint noise and maintenance overhead. If you want the resource names for readability, feed table_name into ids=; otherwise trim the parametrization down to the SQL string.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/users_test.py` around lines 106 - 147, The test
test_delete_user_has_resources_parametrized currently parametrizes tuples as
(table_name, column_name, insert_sql) but only uses insert_sql; remove the
unused table_name and column_name from the parametrization and from the test
signature so the param list is just the SQL string (and update the
`@pytest.mark.parametrize` call accordingly), or alternatively keep the tuples but
pass table_name into the parametrize ids= argument for readability and leave the
test signature using only insert_sql; adjust the function signature in
test_delete_user_has_resources_parametrized to accept only the parameters you
actually use (e.g., insert_sql).

88-101: Assert that the 409 path restores session_hash.

This currently proves only that the user row survives the conflict. Since the route temporarily rewrites session_hash, add an assertion that the original API key is back after the 409 so lockout regressions are caught.

Possible assertion
     user_count = user_test.execute(
         text("SELECT COUNT(*) FROM users WHERE id = :id"),
         parameters={"id": target_id},
     ).scalar()
     assert user_count == 1
+    session_hash = user_test.execute(
+        text("SELECT session_hash FROM users WHERE id = :id"),
+        parameters={"id": target_id},
+    ).scalar()
+    assert session_hash == ApiKey.DATASET_130_OWNER
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/users_test.py` around lines 88 - 101, The test
test_delete_user_has_resources checks that deleting a user with resources
returns 409 and leaves the row intact but misses asserting that the route
restores the user's session_hash; after calling
py_api.delete(f"/users/{target_id}?api_key={ApiKey.DATASET_130_OWNER}") and
verifying the conflict, query the users table for the session_hash for
id=target_id (using the existing user_test.execute/text call pattern) and assert
it equals the original API key value (ApiKey.DATASET_130_OWNER) to ensure the
temporary rewrite was reverted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/users.py`:
- Around line 17-20: Update the OpenAPI description for the user deletion
endpoint (the description argument on the delete user route in
src/routers/openml/users.py, e.g., the delete_user/router.delete handler) to
reflect the full set of owned resources that block deletion instead of only
"datasets, flows, or runs"; either replace that phrase with “owned resources” or
enumerate: datasets, flows, runs, implementations, studies, task_studies,
run_studies, and dataset_tags (or the exact names used in the codebase) so the
docstring and clients accurately reflect the blocker set.
- Around line 64-86: The code currently restores session_hash only in the 409
branch; wrap the block that calls get_user_resource_count, delete_user and the
final user_db.commit() in a try/finally so original_session_hash is always
restored if deletion does not complete; specifically, after setting the
temporary session_hash you should run get_user_resource_count(...), check the
409 path as-is, then attempt delete_user(user_id=user_id, connection=user_db)
and user_db.commit(); set a local flag like deletion_succeeded = True after the
delete+commit and in the finally restore session_hash via
user_db.execute(text("UPDATE users SET session_hash = :hash WHERE id = :id"),
parameters={"hash": original_session_hash, "id": user_id}) and user_db.commit()
only when deletion_succeeded is False (or when an exception occurred) so the
original_session_hash is always put back on non-delete failures.
- Around line 57-64: The current update sets session_hash = 'DELETION_PENDING'
which can be used as a valid api_key; change the invalidation to use an
unguessable tombstone or a dedicated flag that the auth path will reject: in the
block that writes session_hash (the code using original_session_hash and
user_db.execute(... "UPDATE users SET session_hash = 'DELETION_PENDING' ...")),
replace the literal with a securely generated unique value (e.g. a UUID or
cryptographic random token) or instead set a new deletion_pending boolean/enum
column and update that field; also update the authentication check in the auth
function in users.py (the code that queries WHERE session_hash = :api_key) to
explicitly reject tombstone tokens or to check deletion_pending = false so the
temporary marker cannot be used to authenticate.

---

Nitpick comments:
In `@tests/routers/openml/users_test.py`:
- Around line 106-147: The test test_delete_user_has_resources_parametrized
currently parametrizes tuples as (table_name, column_name, insert_sql) but only
uses insert_sql; remove the unused table_name and column_name from the
parametrization and from the test signature so the param list is just the SQL
string (and update the `@pytest.mark.parametrize` call accordingly), or
alternatively keep the tuples but pass table_name into the parametrize ids=
argument for readability and leave the test signature using only insert_sql;
adjust the function signature in test_delete_user_has_resources_parametrized to
accept only the parameters you actually use (e.g., insert_sql).
- Around line 88-101: The test test_delete_user_has_resources checks that
deleting a user with resources returns 409 and leaves the row intact but misses
asserting that the route restores the user's session_hash; after calling
py_api.delete(f"/users/{target_id}?api_key={ApiKey.DATASET_130_OWNER}") and
verifying the conflict, query the users table for the session_hash for
id=target_id (using the existing user_test.execute/text call pattern) and assert
it equals the original API key value (ApiKey.DATASET_130_OWNER) to ensure the
temporary rewrite was reverted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0ab66871-1f71-4393-9aed-08b3dbc70124

📥 Commits

Reviewing files that changed from the base of the PR and between 6449a2e and 11e3c99.

📒 Files selected for processing (3)
  • src/routers/openml/users.py
  • tests/routers/openml/flows_test.py
  • tests/routers/openml/users_test.py

Comment on lines +57 to +64
# Invalidate session immediately to prevent concurrent resource creation
# This serves as a 'deletion_pending' lock as suggested in code review
original_session_hash = original[0]
user_db.execute(
text("UPDATE users SET session_hash = 'DELETION_PENDING' WHERE id = :id"),
parameters={"id": user_id},
)
user_db.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

DELETION_PENDING becomes a working credential here.

src/database/users.py:29-40 still authenticates with WHERE session_hash = :api_key, so after Line 61 any request that sends api_key=DELETION_PENDING can impersonate the target user until this flow restores or deletes the row. Use an unguessable tombstone value here, or better, a dedicated pending flag that the auth path rejects.

Possible minimal fix
 from http import HTTPStatus
+import secrets
 from typing import Annotated, Any
-    user_db.execute(
-        text("UPDATE users SET session_hash = 'DELETION_PENDING' WHERE id = :id"),
-        parameters={"id": user_id},
-    )
+    pending_session_hash = secrets.token_hex(32)
+    user_db.execute(
+        text("UPDATE users SET session_hash = :session_hash WHERE id = :id"),
+        parameters={"session_hash": pending_session_hash, "id": user_id},
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/users.py` around lines 57 - 64, The current update sets
session_hash = 'DELETION_PENDING' which can be used as a valid api_key; change
the invalidation to use an unguessable tombstone or a dedicated flag that the
auth path will reject: in the block that writes session_hash (the code using
original_session_hash and user_db.execute(... "UPDATE users SET session_hash =
'DELETION_PENDING' ...")), replace the literal with a securely generated unique
value (e.g. a UUID or cryptographic random token) or instead set a new
deletion_pending boolean/enum column and update that field; also update the
authentication check in the auth function in users.py (the code that queries
WHERE session_hash = :api_key) to explicitly reject tombstone tokens or to check
deletion_pending = false so the temporary marker cannot be used to authenticate.

- openapi: broaden description to match full blocked resource matrix

- auth: use unguessable uuid for pending deletion lock to prevent impersonation

- database: wrap deletion in try/finally to ensure session token restoration on unexpected failures
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/routers/openml/users.py (1)

59-84: ⚠️ Potential issue | 🟠 Major

The zero-resource guarantee is still racy across requests.

The UUID tombstone on Lines 61-66 only blocks new authentications. A request that authenticated before Line 66 can still write to expdb, so Lines 70-84 can observe 0 and then delete the user while a concurrent create sneaks in. If the contract is “no owned resources”, resource-creating write paths still need to reject a durable pending-deletion state before this count runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/users.py` around lines 59 - 84, The current approach only
invalidates new authentications but leaves a race where an already-authenticated
request can create resources while you check get_user_resource_count and call
delete_user; fix by making the pending-deletion state durable and checked by
creators and by performing a transactional lock: atomically mark the user as
pending deletion (e.g., set a pending_deletion boolean or tombstone column)
using SELECT ... FOR UPDATE on the user row before calling
get_user_resource_count and delete_user, then commit the deletion; additionally,
update all resource-creation paths (the functions that create resources — search
for your create_* functions and any write paths that call expdb inserts) to
explicitly check the pending_deletion flag and reject writes if set, so no new
resources can sneak in during the delete flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/users.py`:
- Around line 48-66: The current logic snapshots and later restores
users.session_hash, which can clobber another request's temporary lock; instead
add and use an explicit deletion-in-progress flag on the row (e.g., a boolean
column deletion_in_progress) and update the flow in the SELECT FOR UPDATE block
and the restore block: after selecting the row in the code around the SELECT ...
FOR UPDATE (where original_session_hash and temp_lock_hash are set) set
deletion_in_progress = true in the same transaction, and when cleaning up (the
section currently restoring session_hash around lines 87-94) only clear
deletion_in_progress = false (and avoid blindly restoring session_hash), or
restore the original session_hash only if the current session_hash still equals
the temp_lock_hash; update any rollback/cleanup to clear deletion_in_progress to
false. Locate changes to the SELECT ... FOR UPDATE block and the corresponding
restore/cleanup block in users.py to implement this.

In `@tests/routers/openml/users_test.py`:
- Around line 151-165: The test is committing shared test DB state by calling
expdb_test.commit(), which breaks rollback isolation; instead avoid committing
the outer transaction: wrap the orphan row insertion (the SET FOREIGN_KEY_CHECKS
toggles and expdb_test.execute(text(insert_sql), parameters={"id": new_id})) in
a nested/ savepoint-scoped transaction (e.g., expdb_test.begin_nested() / a
SAVEPOINT) or use the testing harness' scoped connection helpers so the INSERT
remains rollback-only, and remove the explicit expdb_test.commit() so that
user_test/expdb_test remain within the automatic_rollback scope when
py_api.delete(`/users/{new_id}`) runs. Ensure you still disable/enable
FOREIGN_KEY_CHECKS only inside that nested scope and reference the existing
variables insert_sql, new_id, expdb_test, and user_test.

---

Duplicate comments:
In `@src/routers/openml/users.py`:
- Around line 59-84: The current approach only invalidates new authentications
but leaves a race where an already-authenticated request can create resources
while you check get_user_resource_count and call delete_user; fix by making the
pending-deletion state durable and checked by creators and by performing a
transactional lock: atomically mark the user as pending deletion (e.g., set a
pending_deletion boolean or tombstone column) using SELECT ... FOR UPDATE on the
user row before calling get_user_resource_count and delete_user, then commit the
deletion; additionally, update all resource-creation paths (the functions that
create resources — search for your create_* functions and any write paths that
call expdb inserts) to explicitly check the pending_deletion flag and reject
writes if set, so no new resources can sneak in during the delete flow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2aafd616-0690-407c-ad5e-94dc50c3b81b

📥 Commits

Reviewing files that changed from the base of the PR and between 11e3c99 and f0999e5.

📒 Files selected for processing (2)
  • src/routers/openml/users.py
  • tests/routers/openml/users_test.py

Comment on lines +48 to +66
original = user_db.execute(
text("SELECT session_hash FROM users WHERE id = :id FOR UPDATE"),
parameters={"id": user_id},
).fetchone()

if original is None:
raise HTTPException(
status_code=HTTPStatus.NOT_FOUND,
detail={"code": str(int(UserError.NOT_FOUND)), "message": "User not found"},
)

# Invalidate session immediately to prevent concurrent resource creation
original_session_hash = original[0]
temp_lock_hash = uuid.uuid4().hex
user_db.execute(
text("UPDATE users SET session_hash = :lock_hash WHERE id = :id"),
parameters={"lock_hash": temp_lock_hash, "id": user_id},
)
user_db.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Concurrent delete attempts can restore the wrong session_hash.

Lines 48-66 snapshot whatever value is currently in users.session_hash. If a second request enters after another delete attempt has already written its UUID lock, original_session_hash becomes that temporary value. When both requests unwind through Lines 87-94, the last restore wins and can leave the account stuck with a random hash even though deletion was blocked. This needs an explicit “deletion in progress” state instead of restoring whatever happened to be in session_hash.

Also applies to: 87-94

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/users.py` around lines 48 - 66, The current logic
snapshots and later restores users.session_hash, which can clobber another
request's temporary lock; instead add and use an explicit deletion-in-progress
flag on the row (e.g., a boolean column deletion_in_progress) and update the
flow in the SELECT FOR UPDATE block and the restore block: after selecting the
row in the code around the SELECT ... FOR UPDATE (where original_session_hash
and temp_lock_hash are set) set deletion_in_progress = true in the same
transaction, and when cleaning up (the section currently restoring session_hash
around lines 87-94) only clear deletion_in_progress = false (and avoid blindly
restoring session_hash), or restore the original session_hash only if the
current session_hash still equals the temp_lock_hash; update any
rollback/cleanup to clear deletion_in_progress to false. Locate changes to the
SELECT ... FOR UPDATE block and the corresponding restore/cleanup block in
users.py to implement this.

Comment on lines +151 to +165
user_test.execute(
text(
"INSERT INTO users (session_hash, email, first_name, last_name, password)"
" VALUES ('eeeeffffccccddddaaaabbbbccccdddd', 'res@test.com', 'Del', 'User', 'x')",
),
)
(new_id,) = user_test.execute(text("SELECT LAST_INSERT_ID()")).one()

# Disable constraints temporarily to inject simple orphaned rows for testing 409
expdb_test.execute(text("SET FOREIGN_KEY_CHECKS=0"))
expdb_test.execute(text(insert_sql), parameters={"id": new_id})
expdb_test.execute(text("SET FOREIGN_KEY_CHECKS=1"))
expdb_test.commit()

response = py_api.delete(f"/users/{new_id}?api_key=eeeeffffccccddddaaaabbbbccccdddd")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This parametrized test breaks its rollback isolation.

py_api injects these exact user_test/expdb_test connections into the app. Calling expdb_test.commit() on Line 163, then hitting a handler on Line 165 that commits user_db on the shared user_test connection, closes the outer automatic_rollback transactions and can leak the inserted user/resource into later cases. These staged rows should stay inside rollback-scoped connections or nested transactions instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/users_test.py` around lines 151 - 165, The test is
committing shared test DB state by calling expdb_test.commit(), which breaks
rollback isolation; instead avoid committing the outer transaction: wrap the
orphan row insertion (the SET FOREIGN_KEY_CHECKS toggles and
expdb_test.execute(text(insert_sql), parameters={"id": new_id})) in a nested/
savepoint-scoped transaction (e.g., expdb_test.begin_nested() / a SAVEPOINT) or
use the testing harness' scoped connection helpers so the INSERT remains
rollback-only, and remove the explicit expdb_test.commit() so that
user_test/expdb_test remain within the automatic_rollback scope when
py_api.delete(`/users/{new_id}`) runs. Ensure you still disable/enable
FOREIGN_KEY_CHECKS only inside that nested scope and reference the existing
variables insert_sql, new_id, expdb_test, and user_test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Endpoint to delete an account

1 participant