Standardize output filenames #835

amas0 · 2025-11-06T18:33:24Z

Submission Checklist

Run unit tests
Declare copyright holder and open-source license: see below

Summary

This change proposes a new standard naming convention for output files to address some inconsistencies identified in #833.

Specifically it fixes the mismatched stdout mismatched index and enforces all output files to stick to a naming convention that starts with {model_name}-{datetime}_{id}. Extra names, e.g. stdout, profile, will be append with an _{extra} before the file suffix.

This structure was chosen as it is compatible with the intended new argument save_cmdstan_config referenced in #714.

Under these proposed changes, a default 4-chain sample with latent dynamics and profiling saved will create the following output files:

bernoulli-20251106132150_1.csv
bernoulli-20251106132150_1_diagnostic.csv
bernoulli-20251106132150_1_profile.csv
bernoulli-20251106132150_1_stdout.txt
bernoulli-20251106132150_2.csv
bernoulli-20251106132150_2_diagnostic.csv
bernoulli-20251106132150_2_profile.csv
bernoulli-20251106132150_2_stdout.txt
bernoulli-20251106132150_3.csv
bernoulli-20251106132150_3_diagnostic.csv
bernoulli-20251106132150_3_profile.csv
bernoulli-20251106132150_3_stdout.txt
bernoulli-20251106132150_4.csv
bernoulli-20251106132150_4_diagnostic.csv
bernoulli-20251106132150_4_profile.csv
bernoulli-20251106132150_4_stdout.txt

Not included in this PR, but anticipated is the inclusion of the cmdstan config saved, which produces the following output if included:

bernoulli-20251106132150_1_config.json
bernoulli-20251106132150_1.csv
bernoulli-20251106132150_1_diagnostic.csv
bernoulli-20251106132150_1_profile.csv
bernoulli-20251106132150_1_stdout.txt
bernoulli-20251106132150_2_config.json
bernoulli-20251106132150_2.csv
bernoulli-20251106132150_2_diagnostic.csv
bernoulli-20251106132150_2_profile.csv
bernoulli-20251106132150_2_stdout.txt
bernoulli-20251106132150_3_config.json
bernoulli-20251106132150_3.csv
bernoulli-20251106132150_3_diagnostic.csv
bernoulli-20251106132150_3_profile.csv
bernoulli-20251106132150_3_stdout.txt
bernoulli-20251106132150_4_config.json
bernoulli-20251106132150_4.csv
bernoulli-20251106132150_4_diagnostic.csv
bernoulli-20251106132150_4_profile.csv
bernoulli-20251106132150_4_stdout.txt

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): myself

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)

amas0 · 2025-11-06T18:58:11Z

@WardBrian Do you have any insight into this test failure? I don't have a windows machine on hand to look more closely.

WardBrian · 2025-11-06T19:14:38Z

I think it might just be sporadic, lets see

Related to the PR, I think this might be wrong when using STAN_THREADs to run multiple chains in one executable. In those cases, the _ID will always be forced to be at the end. For example, if you run

$ ./bernoulli sample num_chains=4 data file=bernoulli.data.json output diagnostic_file=foo.csv
$ ls
bernoulli            bernoulli.data.R  foo_1.csv  foo_3.csv  output_1.csv  output_3.csv
bernoulli.data.json  bernoulli.stan    foo_2.csv  foo_4.csv  output_2.csv  output_4.csv

Runset has to deal with this complexity of abstracting over whether we picked the filenames or cmdstan did, at least until the minimum version is 2.37 and we can use a comma separated list and do whatever we want.

I suspect we may be lacking sufficient tests that set force_one_process_per_chain=False to actually expose these issues

Note that when it is time to set save_cmdstan_config, only one of those files will be generated regardless of num_chains, so that will add yet another special case.

amas0 · 2025-11-06T20:41:55Z

Hmmmm, okay I see the distinctions in a quick test. Enabling STAN_THREADS and sampling under this current code will generate (with save_cmdstan_config enabled):

bernoulli-20251106145201_1.csv
bernoulli-20251106145201_2.csv
bernoulli-20251106145201_3.csv
bernoulli-20251106145201_4.csv
bernoulli-20251106145201_config.json
bernoulli-20251106145201_diagnostic_1.csv
bernoulli-20251106145201_diagnostic_2.csv
bernoulli-20251106145201_diagnostic_3.csv
bernoulli-20251106145201_diagnostic_4.csv
bernoulli-20251106145201_profile.csv
bernoulli-20251106145201_stdout.txt

So we get on per-chain files (just diagnostic) an appended _{id}.

This is my understanding:

In the case where we run one-process per file, we can get consistency across the output files, because we set the the ID as part of the base file names we give to cmdstan, but if we sample in a single process, cmdstan will generate the per-chain files with an appended _{id}, which seems to only affect the diagnostic file (and the stancsv, but extra bits to the filename makes it the same) since everything else is per-process.

Does that sound right?

I'm not sure if we can enforce absolute consistency between the single vs multi process cases unless we renamed files after they were written by cmdstan? In the one process for all chains case, we could rename diagnostic files to have the id before _diagnostic to make the id always be in the beginning. If we attempted to move the id to the end, then we would have to rename the config.json file in the multiple process case since that seems to inherit naming from the stancsv name.

We could also just accept that when force_one_process_per_chain is true, then the ids occur first, and in the alternative, they occur after (which would only happen when save_latent_dynamics is enabled anyways).

WardBrian · 2025-11-06T22:33:21Z

I think it’s more consistent if we always end with _id, and then when we only have one chain we just know we won’t have any _# for those other files

I’m still hopeful one day we can make the multithreaded behavior the default and nuke all this…

amas0 · 2025-11-06T23:11:53Z

Okay cool -- I've swapped the order. This will consistent throughout all the output files we have right now, regardless of single/multi process. For clarity, this is the output now:

one_process_per_chain = False

bernoulli-20251106180710_1.csv
bernoulli-20251106180710_2.csv
bernoulli-20251106180710_3.csv
bernoulli-20251106180710_4.csv
bernoulli-20251106180710_diagnostic_1.csv
bernoulli-20251106180710_diagnostic_2.csv
bernoulli-20251106180710_diagnostic_3.csv
bernoulli-20251106180710_diagnostic_4.csv
bernoulli-20251106180710_profile.csv
bernoulli-20251106180710_stdout.txt

and one_process_per_chain = True:

bernoulli-20251106180852_1.csv
bernoulli-20251106180852_2.csv
bernoulli-20251106180852_3.csv
bernoulli-20251106180852_4.csv
bernoulli-20251106180852_diagnostic_1.csv
bernoulli-20251106180852_diagnostic_2.csv
bernoulli-20251106180852_diagnostic_3.csv
bernoulli-20251106180852_diagnostic_4.csv
bernoulli-20251106180852_profile_1.csv
bernoulli-20251106180852_profile_2.csv
bernoulli-20251106180852_profile_3.csv
bernoulli-20251106180852_profile_4.csv
bernoulli-20251106180852_stdout_1.txt
bernoulli-20251106180852_stdout_2.txt
bernoulli-20251106180852_stdout_3.txt
bernoulli-20251106180852_stdout_4.txt

I've got to add some tests for these cases, update some docstrings, and probably update some documentation somewhere. Will let you know when this is good to go.

What are the major barriers to making the multithreaded behavior the default? Would certainly clean some of this up..

cmdstanpy/stanfit/runset.py

WardBrian · 2025-11-07T15:57:56Z

What are the major barriers to making the multithreaded behavior the default? Would certainly clean some of this up

See the discussion at stan-dev/cmdstan#1179. tl;dr is that our current Windows toolchains suffer from pretty dramatic slowdowns when threading is enabled at compile time. This isn't an inherent Windows issue (e.g., using WSL or clang-cl does not have this issue), so there's some hope that updated or swapped-out toolchains could allow us to do this

amas0 · 2025-11-07T20:21:11Z

@WardBrian Going through updating docstrings and such. Quite a few places that say:

Output files written to the temporary directory contain an additional
8-character random string, e.g. 'bernoulli-201912081451-1-5nm6as7u.csv'.

I'm assuming this is quite out of date? Blame places these lines at ~5 years old and this isn't something that happens on my machine. Just wanted to double check that this isn't a platform-specific thing or an edge case before I remove it.

WardBrian · 2025-11-07T20:25:46Z

Oh yeah, I don't think we've done that in a very long time

amas0 · 2025-11-07T21:58:08Z

I think this should be good to go @WardBrian

WardBrian

Looks good, thank you!

amas0 added 2 commits November 6, 2025 13:15

Standardize output file naming

02e627d

Slight cleanup

a6e8291

amas0 linked an issue Nov 6, 2025 that may be closed by this pull request

Mismatch between -stdout.txt output files and also some other output naming conventions #833

Closed

Reverse order of extra and id in file name gen

5429dc4

WardBrian reviewed Nov 7, 2025

View reviewed changes

cmdstanpy/stanfit/runset.py Outdated Show resolved Hide resolved

amas0 added 4 commits November 7, 2025 11:04

Simplify RunSet.gen_file_name

a62e7d2

Add additional RunSet output file testing

ef7ca63

Remove id in output filenames if single chain

3d397e7

Add single chain output file testing

747b0c1

amas0 added 3 commits November 7, 2025 15:54

Update CmdStanModel docstrings with updated file outputs

b5c94e0

Update user's guide output file info

1442ad8

Update stanfit docstrings

80a5fed

WardBrian approved these changes Nov 8, 2025

View reviewed changes

amas0 merged commit b8bcfab into stan-dev:develop Nov 8, 2025
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Standardize output filenames #835

Standardize output filenames #835

Uh oh!

amas0 commented Nov 6, 2025

Uh oh!

amas0 commented Nov 6, 2025

Uh oh!

WardBrian commented Nov 6, 2025

Uh oh!

amas0 commented Nov 6, 2025

Uh oh!

WardBrian commented Nov 6, 2025

Uh oh!

amas0 commented Nov 6, 2025

Uh oh!

Uh oh!

WardBrian commented Nov 7, 2025

Uh oh!

amas0 commented Nov 7, 2025

Uh oh!

WardBrian commented Nov 7, 2025

Uh oh!

amas0 commented Nov 7, 2025

Uh oh!

WardBrian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Standardize output filenames #835

Standardize output filenames #835

Uh oh!

Conversation

amas0 commented Nov 6, 2025

Submission Checklist

Summary

Copyright and Licensing

Uh oh!

amas0 commented Nov 6, 2025

Uh oh!

WardBrian commented Nov 6, 2025

Uh oh!

amas0 commented Nov 6, 2025

Uh oh!

WardBrian commented Nov 6, 2025

Uh oh!

amas0 commented Nov 6, 2025

Uh oh!

Uh oh!

WardBrian commented Nov 7, 2025

Uh oh!

amas0 commented Nov 7, 2025

Uh oh!

WardBrian commented Nov 7, 2025

Uh oh!

amas0 commented Nov 7, 2025

Uh oh!

WardBrian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants