Add generate_OAT_SA_design() for sensitivity analysis input design#3729
Add generate_OAT_SA_design() for sensitivity analysis input design#372932 commits merged intoPecanProject:developfrom
Conversation
mdietze
left a comment
There was a problem hiding this comment.
A good overall test that this function is working correctly would be to verify that the run.write.configs module runs correctly with the OAT design as an input and continues to generate output that's readable by the SA postprocessing and graphing functions. If it doesn't, then this function is generating inputs that are not serving any useful purpose. Along the way, we may want/get to simplify the run write configs logic
| #' all inputs vary together. | ||
| #' | ||
| #' @param settings PEcAn settings object | ||
| #' @param sa_samples Optional. Pre-loaded SA samples from samples.Rdata. |
There was a problem hiding this comment.
This argument suggests a fundamental design misunderstanding. sa_samples should be generated by this function, not read by it. I'm OK with taking this as an optional input if one wants to reuse a previous sa_samples, but if it's not provided it needs to be generated
There was a problem hiding this comment.
the parameter now clearly indicates it's optional for reusing existing samples, not the expected input
| if (is.null(sa_samples)) { | ||
| samples_file <- file.path(settings$outdir, "samples.Rdata") | ||
|
|
||
| # generate samples if they don't exist (safety fallback) |
There was a problem hiding this comment.
This should be the default, not the safety fallback
There was a problem hiding this comment.
removed "safety fallback" comment
|
|
||
| PEcAn.uncertainty::get.parameter.samples( | ||
| settings, | ||
| ensemble.size = 1, # SA doesn't need ensemble samples |
There was a problem hiding this comment.
not needed as this is the default
| settings, | ||
| ensemble.size = 1, # SA doesn't need ensemble samples | ||
| posterior.files, | ||
| ens.sample.method |
There was a problem hiding this comment.
also not needed as this isn't an ensemble sample
| #' @export | ||
| #' @author Akash B V | ||
| #' @importFrom rlang %||% | ||
| generate_OAT_SA_design <- function(settings, sa_samples = NULL) { |
There was a problem hiding this comment.
It might be too much at this PR, but it would be good to move away from settings as a dependency unless the underlying number of pieces of information is too large to meaningfully pass to the function. But in that case it would be good to document exactly what part of the settings is required. Here I think it might just be settings$outdir, settings$pfts, settings$sensitivity.analysis, and settings$ensemble (i.e. I wonder if you could get away with a function that has outdir, pfts, sensitivity.analysis, ensemble, and sa_samples as arguments?). Would also be good to better document semi-hidden dependencies (e.g. what does the pft$posterior.files need to point to for the function to actually sample parameters correctly)
There was a problem hiding this comment.
I agree with reducing dependency of settings for input desing by passing specific argument, but after analyzing both functions to keep the architecture consistent, i found that the parameter requirements are not same; generate_OAT_SA_design needs outdir, pfts, samplingspace, sensitivity.analysis and generate_joint_ensemble_design additionally needs run$inputs (via input.ens.gen() which samples from settings$run$inputs[[input]]$path).
However there is a deeper blocker -- both functions call get.parameter.samples(settings, ...) which itself uses many settings fields (database$bety, host$name, sensitivity.analysis, etc.). So even with explicit parameters in the design functions, we'd still pass settings through to get.parameter.samples. And then that involves refactoring SDA and sobol callers.
anyways i have documented what setting it uses and semi-hidden dependiences in both desing function. I happy to know ur thoughts.
There was a problem hiding this comment.
I think it's fine if the parameters requirements are not the same. I also think it's fine to push the refactor of the generate design functions and get.parameter.samples to a future PR
|
── Failed tests ──────────────────────────────────────────────────────────────── |
we are passing |
This comment was marked as resolved.
This comment was marked as resolved.
| #' one-at-a-time across quantiles. This differs from ensemble design where | ||
| #' all inputs vary together. | ||
| #' | ||
| #' @param settings PEcAn settings object. This function directly uses: |
There was a problem hiding this comment.
Consider "see details" and moving the itemized list down? I find this long a @param entry makes it hard to skim all the options.
| make_mock_sa_samples <- function() { | ||
| list( | ||
| pft1 = structure( | ||
| matrix(1:9, nrow = 3, ncol = 3), | ||
| dimnames = list(c("25", "50", "75"), c("trait1", "trait2", "trait3")) | ||
| ) | ||
| ) | ||
| } |
There was a problem hiding this comment.
Returns a constant -- why is this a function and not just a list?
| @@ -0,0 +1,194 @@ | |||
| make_sa_test_settings <- function() { | |||
| list( | |||
| outdir = withr::local_tempdir(), | |||
There was a problem hiding this comment.
This path looks to me as if it will stop existing when make_sa_test_settings returns! Is the outdir used for anything during testing or could this just be "/fake/output/path/" and skip the withr usage?
There was a problem hiding this comment.
thanks, yes never accessed, when i see the settings parameter requirements for generate_OAT_SA_desing;
used "/fake/output/path/" and commented
| expect_true(all(result$X[[col]] == 1), | ||
| info = paste("Column", col, "should be constant 1 for SA")) |
There was a problem hiding this comment.
?expect_true discourages using info in new code, and the expectation seems pretty clear without it to me. Could unquote for slightly better diagnostic messages:
| expect_true(all(result$X[[col]] == 1), | |
| info = paste("Column", col, "should be constant 1 for SA")) | |
| # all columns but `param` should be a constant 1 | |
| # (`!!` to get the column name into the failure message) | |
| expect_true(all(result$X[[!!col]] == 1)) |
| }) | ||
|
|
||
| test_that("generate_OAT_SA_design param column is sequential", { | ||
| settings <- make_sa_test_settings() | ||
| sa_samples <- make_mock_sa_samples() | ||
|
|
||
| result <- generate_OAT_SA_design(settings, sa_samples = sa_samples) |
There was a problem hiding this comment.
Style nit: Since generating the design takes several lines of setup and is identical for both these tests, multiple expectations in the same test block feels cleaner to me.
| }) | |
| test_that("generate_OAT_SA_design param column is sequential", { | |
| settings <- make_sa_test_settings() | |
| sa_samples <- make_mock_sa_samples() | |
| result <- generate_OAT_SA_design(settings, sa_samples = sa_samples) |
(If you accept this, probably want to edit line 38 to something like "...keeps param column sequential and others constant at 1")
There was a problem hiding this comment.
yup, unified test "generate_OAT_SA_design keeps param sequential and non-param constant at 1"
| @@ -0,0 +1,194 @@ | |||
| make_sa_test_settings <- function() { | |||
There was a problem hiding this comment.
Please match test file names to R file names by naming this file test-generate_OAT_SA_design.R, to avoid needing to wonder which test file to look in to go along with the code you're writing (or vise versa).
There was a problem hiding this comment.
the file tests both generate_OAT_SA_design and generate_joint_ensemble_design (plus their comparison, etc), naming it test-generate_OAT_SA_design.R would suggest it only tests SA design. so i kept test.input_design.R to reflect that it covers both design types
There was a problem hiding this comment.
Then please move the tests of generate_joint_ensemble_design to their own file, and if there are shared helpers then put them in a file whose name starts with helpers.
See https://r-pkgs.org/testing-basics.html for more on recommended organization, but know that the reason I'm insisting on this is that unit testing is built on boring consistency, and inconsistent test file names will confuse future maintainers (including me).
There was a problem hiding this comment.
thanks for pointing this out, agreed!
| sa_non_param <- setdiff(names(sa_result$X), "param") | ||
| for (col in sa_non_param) { | ||
| expect_equal(length(unique(sa_result$X[[col]])), 1, | ||
| info = "SA design: non-param columns must be constant") | ||
| } |
There was a problem hiding this comment.
Isn't this the same condition you already tested around line 45?
There was a problem hiding this comment.
yeah, this is intentional, to check in the comparison test of ens and SA design ( now simplified and with clear intent )
| info = "SA design: non-param columns must be constant") | ||
| } | ||
|
|
||
| ## ensemble design - non-param can vary (mocked to show variation) |
There was a problem hiding this comment.
I don't follow what "to show variation" means here and why it needs all the mocks
There was a problem hiding this comment.
the mock returns varied indices c(1, 2, 3, 1, 2) to demonstrate that the ensemble design structure passes through whatever input.ens.gen returns, unlike the OAT design which forces all non-param columns to constant
and now made the comment much clear
| for (run_id in run_ids) { | ||
| expect_true(dir.exists(file.path(rundir, run_id))) | ||
| } | ||
| }) No newline at end of file |
| #------------------ tests: OAT design with write.sa.configs ------------------- | ||
| # verifies design produces output compatible with SA postprocessing | ||
|
|
||
| test_that("OAT design integrates with write.sa.configs for SA postprocessing", { |
There was a problem hiding this comment.
The degree of complication needed to set up these tests feels like an indicator that we could do better at designing these functions for testability, but I think that can be a future project.
|
@infotroph please take a look, once you approved i will merge this branch with #3708 |
|
heads-up : leaving this PR open until we merge |
dab62a6
originally discussed -- comment
Description
Add
generate_OAT_SA_design()function to create input design matrices for OAT sensitivity analysis.SA requires isolating the effect of each parameter by holding all other inputs (met, IC, soil) constant. The existing
generate_joint_ensemble_design()randomizes these inputs, which is correct for ensemble runs but invalidates SA variance decomposition.added tests in
test.input_design.Rto validate the OAT sensitivity design, including checks :Motivation and Context
Review Time Estimate
Types of changes
Checklist: