-
-
Notifications
You must be signed in to change notification settings - Fork 79
Standardize output filenames #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@WardBrian Do you have any insight into this test failure? I don't have a windows machine on hand to look more closely. |
|
I think it might just be sporadic, lets see Related to the PR, I think this might be wrong when using STAN_THREADs to run multiple chains in one executable. In those cases, the Runset has to deal with this complexity of abstracting over whether we picked the filenames or cmdstan did, at least until the minimum version is 2.37 and we can use a comma separated list and do whatever we want. I suspect we may be lacking sufficient tests that set Note that when it is time to set save_cmdstan_config, only one of those files will be generated regardless of |
|
Hmmmm, okay I see the distinctions in a quick test. Enabling bernoulli-20251106145201_1.csv
bernoulli-20251106145201_2.csv
bernoulli-20251106145201_3.csv
bernoulli-20251106145201_4.csv
bernoulli-20251106145201_config.json
bernoulli-20251106145201_diagnostic_1.csv
bernoulli-20251106145201_diagnostic_2.csv
bernoulli-20251106145201_diagnostic_3.csv
bernoulli-20251106145201_diagnostic_4.csv
bernoulli-20251106145201_profile.csv
bernoulli-20251106145201_stdout.txtSo we get on per-chain files (just diagnostic) an appended This is my understanding: In the case where we run one-process per file, we can get consistency across the output files, because we set the the ID as part of the base file names we give to cmdstan, but if we sample in a single process, cmdstan will generate the per-chain files with an appended Does that sound right? I'm not sure if we can enforce absolute consistency between the single vs multi process cases unless we renamed files after they were written by cmdstan? In the one process for all chains case, we could rename diagnostic files to have the id before We could also just accept that when |
|
I think it’s more consistent if we always end with _id, and then when we only have one chain we just know we won’t have any _# for those other files I’m still hopeful one day we can make the multithreaded behavior the default and nuke all this… |
|
Okay cool -- I've swapped the order. This will consistent throughout all the output files we have right now, regardless of single/multi process. For clarity, this is the output now:
and I've got to add some tests for these cases, update some docstrings, and probably update some documentation somewhere. Will let you know when this is good to go. What are the major barriers to making the multithreaded behavior the default? Would certainly clean some of this up.. |
See the discussion at stan-dev/cmdstan#1179. tl;dr is that our current Windows toolchains suffer from pretty dramatic slowdowns when threading is enabled at compile time. This isn't an inherent Windows issue (e.g., using WSL or clang-cl does not have this issue), so there's some hope that updated or swapped-out toolchains could allow us to do this |
|
@WardBrian Going through updating docstrings and such. Quite a few places that say: I'm assuming this is quite out of date? Blame places these lines at ~5 years old and this isn't something that happens on my machine. Just wanted to double check that this isn't a platform-specific thing or an edge case before I remove it. |
|
Oh yeah, I don't think we've done that in a very long time |
|
I think this should be good to go @WardBrian |
WardBrian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
Submission Checklist
Summary
This change proposes a new standard naming convention for output files to address some inconsistencies identified in #833.
Specifically it fixes the mismatched stdout mismatched index and enforces all output files to stick to a naming convention that starts with
{model_name}-{datetime}_{id}. Extra names, e.g.stdout,profile, will be append with an_{extra}before the file suffix.This structure was chosen as it is compatible with the intended new argument
save_cmdstan_configreferenced in #714.Under these proposed changes, a default 4-chain sample with latent dynamics and profiling saved will create the following output files:
Not included in this PR, but anticipated is the inclusion of the cmdstan config saved, which produces the following output if included:
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): myself
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: