Skip to content

Fix nnUNet test directory leakage into current working directory#8887

Open
aymuos15 wants to merge 8 commits into
Project-MONAI:devfrom
aymuos15:fix-8886-nnunet-cwd
Open

Fix nnUNet test directory leakage into current working directory#8887
aymuos15 wants to merge 8 commits into
Project-MONAI:devfrom
aymuos15:fix-8886-nnunet-cwd

Conversation

@aymuos15
Copy link
Copy Markdown
Contributor

Fixes #8886

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).

ericspod and others added 8 commits April 11, 2026 00:28
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <17726042+ericspod@users.noreply.github.com>
)

The TestnnUNetV2RunnerSecurity YAMLs used cwd-relative paths
(./work, ./nnUNet_raw, ./nnUNet_preprocessed, ./nnUNet_results).
nnUNetV2Runner.__init__ creates the raw/preprocessed/results dirs
eagerly (before dataset-name validation), so instantiating the runner
in these tests wrote into the repo cwd and tearDown could not clean it
up. Root all generated paths in self.test_dir so the temp dir owns
them, including the injection payload's redirect target.

Signed-off-by: Soumya Snigdha Kundu <soumya_snigdha.kundu@kcl.ac.uk>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

📝 Walkthrough

Walkthrough

The PR hardens dataset name handling in nnU-Net runner by introducing regex-based validation. During initialization, dataset_name_or_id is validated against DATASET_ID_FORMAT and raises ValueError for invalid formats. The validated name is stored in typed self.dataset_name. Execution methods (train_parallel_cmd, train_parallel, predict_ensemble_postprocessing) now guard against unset dataset names. The training command builder refactors argument construction to use explicit list assembly and adjusts CLI flag prefixing. A security test suite validates both valid dataset name formats and rejects injection-like values.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Description check ⚠️ Warning Description only references issue #8886 and marks as non-breaking change but omits description of what was actually fixed. Add a description section explaining the changes: dataset validation regex, typed dataset_name attribute, ValueError guards, and path updates in test YAMLs.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive Title describes fixing a directory leakage issue, but changes also add dataset validation, regex enforcement, and new security tests—not reflected in title. Clarify if title should focus solely on directory leakage or broaden to cover dataset validation; current title is incomplete for full changeset scope.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Changes successfully validate dataset_name via DATASET_ID_FORMAT regex, add ValueError guards preventing eager directory creation before validation, and update test YAMLs to use self.test_dir for all paths.
Out of Scope Changes check ✅ Passed All changes directly address #8886: dataset validation prevents premature directory creation; ValueError guards delay creation until validation passes; YAML path updates route generated directories to test temp folder.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/apps/nnunet/nnunetv2_runner.py (1)

209-218: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix dataset_name resolution for DatasetXXX inputs in nnUNetV2Runner

  • DATASET_ID_FORMAT allows Dataset[0-9]{3}, but init casts via int(self.dataset_name_or_id) at line 212; for inputs like "Dataset003" this throws, gets swallowed, and self.dataset_name stays None.
  • With run_convert_dataset=False, train_parallel_cmd/train_parallel/predict_ensemble_postprocessing will then raise ValueError because self.dataset_name is None.
  • convert_dataset line 230 also does int(self.dataset_name_or_id), so "DatasetXXX" inputs would still break even when conversion is enabled; either tighten the regex to numeric IDs only or make the int(...) logic conditional.
🔧 Proposed fix
-            self.dataset_name = maybe_convert_to_dataset_name(int(self.dataset_name_or_id))
+            self.dataset_name = maybe_convert_to_dataset_name(self.dataset_name_or_id)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/apps/nnunet/nnunetv2_runner.py` around lines 209 - 218, The current
int(self.dataset_name_or_id) cast in nnUNetV2Runner init prevents valid
DatasetXXX strings (e.g., "Dataset003") from resolving because
maybe_convert_to_dataset_name is only called with an int and the exception
swallows the failure leaving self.dataset_name None; update the logic in the
__init__/dataset resolution and in convert_dataset so you first detect whether
self.dataset_name_or_id matches a pure-numeric ID (using DATASET_ID_FORMAT or a
digits-only check) and only cast to int in that case, otherwise pass the raw
string into maybe_convert_to_dataset_name (or call maybe_convert_to_dataset_name
without int conversion); ensure convert_dataset likewise uses the same
conditional casting so run_convert_dataset=False and conversion paths both
resolve DatasetXXX names correctly and do not leave self.dataset_name as None.
🧹 Nitpick comments (3)
tests/integration/test_integration_nnunetv2_runner.py (3)

33-33: ⚡ Quick win

Scope the logger override to the test and restore it.

Setting the runner logger level at import time leaks global state across the whole test session and can hide warnings from unrelated tests. Patch it inside the relevant test/class setup and restore the previous level in cleanup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_integration_nnunetv2_runner.py` at line 33, The
module-level call
monai.apps.nnunet.nnunetv2_runner.logger.setLevel(logging.ERROR) is changing
global logging state at import time; move this into the specific test's setup
(or a pytest fixture) and restore the original level in teardown (or after
fixture yield). Concretely, capture the previous level via prev =
monai.apps.nnunet.nnunetv2_runner.logger.level, set to logging.ERROR for the
duration of the test (or fixture), and then reset
monai.apps.nnunet.nnunetv2_runner.logger.setLevel(prev) in a finally/teardown
block so the override does not leak to other tests.

158-163: ⚡ Quick win

Assert that the injection payload produced no side effect.

assertRaises(ValueError) still passes if the constructor creates test.txt before failing. Add a negative assertion so this stays a real security regression test.

Suggested patch
     def test_nnunetv2runner_bad_dataset_name(self) -> None:
         """
         Test the dataset name given must conform to the nnUNet requirement of being an int or "Dataset###".
         """
         with self.assertRaises(ValueError):
             nnUNetV2Runner(input_config=self.inject_yml, trainer_class_name="nnUNetTrainer")
+        self.assertFalse(os.path.exists(os.path.join(self.test_dir.name, "test.txt")))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_integration_nnunetv2_runner.py` around lines 158 -
163, The test test_nnunetv2runner_bad_dataset_name currently only checks that
nnUNetV2Runner(input_config=self.inject_yml, trainer_class_name="nnUNetTrainer")
raises ValueError but doesn't assert there were no side‑effects; update the test
to, after the assertRaises block, explicitly assert that the injection artifact
(e.g. the file created by the payload such as "test.txt" or any files under the
same temp dir used by self.inject_yml) does NOT exist. Locate the test method
test_nnunetv2runner_bad_dataset_name and add a negative assertion referencing
the specific artifact path (or verify the directory referenced by
self.inject_yml is unchanged/empty) to ensure the constructor failed without
creating files. Ensure the extra assertion runs after the context manager that
expects ValueError so the test still fails if the artifact was created.

101-166: ⚡ Quick win

Add Google-style docstrings to the new definitions.

The new class, setUp, and tearDown are undocumented, and the two test-method docstrings are not Google-style docstrings with return/exception sections.

As per coding guidelines, "Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/integration/test_integration_nnunetv2_runner.py` around lines 101 -
166, Add Google-style docstrings to the TestnnUNetV2RunnerSecurity class and its
methods: setUp, tearDown, test_nnunetv2runner_good_dataset_name, and
test_nnunetv2runner_bad_dataset_name. For each definition include a short
description, Args sections for any parameters (e.g., self), Returns if
applicable (use None for setUp/tearDown/test methods), and Raises for expected
exceptions (e.g., ValueError in test_nnunetv2runner_bad_dataset_name). Update
the docstrings in the class TestnnUNetV2RunnerSecurity and the methods setUp,
tearDown, test_nnunetv2runner_good_dataset_name, and
test_nnunetv2runner_bad_dataset_name to follow Google-style formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@monai/apps/nnunet/nnunetv2_runner.py`:
- Around line 585-605: The argv builder in the kwargs loop currently always
appends str(_value), causing store_true options to appear as "--flag True"
instead of a bare "--flag" and including false/empty pretrained_weights; update
the loop in nnunetv2_runner.py where cmd is assembled so that for boolean flags
(e.g., keys "c", "val", "use_compressed", "disable_checkpointing" and any other
store_true options) you only append the flag name (prefix + _key) when _value is
True and skip it when False, and for "pretrained_weights" (or "-p") only append
the option and its value when _value is truthy (a non-empty path); keep the
existing prefix selection logic for short vs long options and continue
converting items to strings later.

---

Outside diff comments:
In `@monai/apps/nnunet/nnunetv2_runner.py`:
- Around line 209-218: The current int(self.dataset_name_or_id) cast in
nnUNetV2Runner init prevents valid DatasetXXX strings (e.g., "Dataset003") from
resolving because maybe_convert_to_dataset_name is only called with an int and
the exception swallows the failure leaving self.dataset_name None; update the
logic in the __init__/dataset resolution and in convert_dataset so you first
detect whether self.dataset_name_or_id matches a pure-numeric ID (using
DATASET_ID_FORMAT or a digits-only check) and only cast to int in that case,
otherwise pass the raw string into maybe_convert_to_dataset_name (or call
maybe_convert_to_dataset_name without int conversion); ensure convert_dataset
likewise uses the same conditional casting so run_convert_dataset=False and
conversion paths both resolve DatasetXXX names correctly and do not leave
self.dataset_name as None.

---

Nitpick comments:
In `@tests/integration/test_integration_nnunetv2_runner.py`:
- Line 33: The module-level call
monai.apps.nnunet.nnunetv2_runner.logger.setLevel(logging.ERROR) is changing
global logging state at import time; move this into the specific test's setup
(or a pytest fixture) and restore the original level in teardown (or after
fixture yield). Concretely, capture the previous level via prev =
monai.apps.nnunet.nnunetv2_runner.logger.level, set to logging.ERROR for the
duration of the test (or fixture), and then reset
monai.apps.nnunet.nnunetv2_runner.logger.setLevel(prev) in a finally/teardown
block so the override does not leak to other tests.
- Around line 158-163: The test test_nnunetv2runner_bad_dataset_name currently
only checks that nnUNetV2Runner(input_config=self.inject_yml,
trainer_class_name="nnUNetTrainer") raises ValueError but doesn't assert there
were no side‑effects; update the test to, after the assertRaises block,
explicitly assert that the injection artifact (e.g. the file created by the
payload such as "test.txt" or any files under the same temp dir used by
self.inject_yml) does NOT exist. Locate the test method
test_nnunetv2runner_bad_dataset_name and add a negative assertion referencing
the specific artifact path (or verify the directory referenced by
self.inject_yml is unchanged/empty) to ensure the constructor failed without
creating files. Ensure the extra assertion runs after the context manager that
expects ValueError so the test still fails if the artifact was created.
- Around line 101-166: Add Google-style docstrings to the
TestnnUNetV2RunnerSecurity class and its methods: setUp, tearDown,
test_nnunetv2runner_good_dataset_name, and test_nnunetv2runner_bad_dataset_name.
For each definition include a short description, Args sections for any
parameters (e.g., self), Returns if applicable (use None for setUp/tearDown/test
methods), and Raises for expected exceptions (e.g., ValueError in
test_nnunetv2runner_bad_dataset_name). Update the docstrings in the class
TestnnUNetV2RunnerSecurity and the methods setUp, tearDown,
test_nnunetv2runner_good_dataset_name, and test_nnunetv2runner_bad_dataset_name
to follow Google-style formatting.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 631f3e0e-2273-4606-a386-197be2c23c11

📥 Commits

Reviewing files that changed from the base of the PR and between dc0f59c and a4e5465.

📒 Files selected for processing (2)
  • monai/apps/nnunet/nnunetv2_runner.py
  • tests/integration/test_integration_nnunetv2_runner.py

Comment on lines 585 to +605
cmd = [
"nnUNetv2_train",
f"{self.dataset_name_or_id}",
f"{config}",
f"{fold}",
self.dataset_name_or_id,
config,
fold,
"-tr",
f"{self.trainer_class_name}",
self.trainer_class_name,
"-num_gpus",
f"{num_gpus}",
num_gpus,
]

if self.export_validation_probabilities:
cmd.append("--npz")

for _key, _value in kwargs.items():
if _key == "p" or _key == "pretrained_weights":
cmd.extend([f"-{_key}", f"{_value}"])
else:
cmd.extend([f"--{_key}", f"{_value}"])
return cmd, env
prefix = "-" if _key in {"p", "pretrained_weights"} else "--"
cmd += [f"{prefix}{_key}", str(_value)]

cmd_str: list[str] = [str(c) for c in cmd]

return cmd_str, env
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

nnUNetv2_train argparse flags store_true --c --val --use_compressed --disable_checkpointing

💡 Result:

In nnU-Net v2, the command line utility nnUNetv2_train utilizes several argparse flags to control the training and validation process. These flags are implemented as store_true options, meaning they do not require an associated value and serve as toggles [1][2]. The requested flags perform the following functions: --c (Continue training): This flag instructs the trainer to resume training from the most recent checkpoint. It is essential if a training run was interrupted or if you wish to extend a previously stopped training session [2]. --val (Validation only): This flag sets the trainer to skip the training process and instead run only the final validation on the specified fold [2][3]. This is often used after a training run has completed or to re-run validation without re-training [4][3]. --use_compressed: This flag directs the system to use compressed data during training. By default, nnU-Net may decompress data; enabling this flag prevents decompression. It is noted that reading compressed data is significantly more CPU and potentially RAM intensive, and it should only be used when necessary [2]. --disable_checkpointing: This flag disables the saving of model checkpoints during training. This is useful in scenarios where you are testing a configuration or running quick experiments and do not want to consume disk space with multiple large model checkpoint files [2]. In addition to these, the --npz flag is frequently used alongside these commands. Although not explicitly in your list, it is critical to note that --npz is required if you intend to use nnUNetv2_find_best_configuration later, as it forces the model to save the softmax probabilities from the final validation, which are necessary for ensemble creation and model selection [4][3].

Citations:


Fix argv construction for store_true kwargs (e.g., --c, --val) in monai/apps/nnunet/nnunetv2_runner.py (lines 585-605): the kwargs loop always appends str(_value), so documented boolean flags (c, val, use_compressed, disable_checkpointing) become --c True / --val True instead of being emitted as bare flags (the same pattern as the special-case --npz). Update the builder to emit --<flag> only when the value is True and omit the flag when False (and similarly only include pretrained_weights when it’s a real path, not False).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/apps/nnunet/nnunetv2_runner.py` around lines 585 - 605, The argv
builder in the kwargs loop currently always appends str(_value), causing
store_true options to appear as "--flag True" instead of a bare "--flag" and
including false/empty pretrained_weights; update the loop in nnunetv2_runner.py
where cmd is assembled so that for boolean flags (e.g., keys "c", "val",
"use_compressed", "disable_checkpointing" and any other store_true options) you
only append the flag name (prefix + _key) when _value is True and skip it when
False, and for "pretrained_weights" (or "-p") only append the option and its
value when _value is truthy (a non-empty path); keep the existing prefix
selection logic for short vs long options and continue converting items to
strings later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nnUNet Writes Into Current Directory By Default

2 participants