Skip to content

Mismatch between -stdout.txt output files and also some other output naming conventions #833

@amas0

Description

@amas0

As part of running a Stan model, we generate a range of output files; by default -stdout.txt and .csv files that contain the process output and inference output, respectively. Optionally, we can output diagnostic and profile files to track latent dynamics and profiling information, if present.

The current logic will produce a set of output files (under the default 4 chains, one process per chain) like this:

-rw-r--r-- 1 amas amas   2367 Nov  5 15:28 bernoulli-20251105152812_0-stdout.txt
-rw-r--r-- 1 amas amas  88052 Nov  5 15:28 bernoulli-20251105152812_1.csv
-rw-r--r-- 1 amas amas   2357 Nov  5 15:28 bernoulli-20251105152812_1-stdout.txt
-rw-r--r-- 1 amas amas  88050 Nov  5 15:28 bernoulli-20251105152812_2.csv
-rw-r--r-- 1 amas amas   2358 Nov  5 15:28 bernoulli-20251105152812_2-stdout.txt
-rw-r--r-- 1 amas amas  88090 Nov  5 15:28 bernoulli-20251105152812_3.csv
-rw-r--r-- 1 amas amas   2358 Nov  5 15:28 bernoulli-20251105152812_3-stdout.txt
-rw-r--r-- 1 amas amas  86919 Nov  5 15:28 bernoulli-20251105152812_4.csv
-rw-r--r-- 1 amas amas 150457 Nov  5 15:28 bernoulli-20251105152812-diagnostic_1.csv
-rw-r--r-- 1 amas amas 150444 Nov  5 15:28 bernoulli-20251105152812-diagnostic_2.csv
-rw-r--r-- 1 amas amas 150808 Nov  5 15:28 bernoulli-20251105152812-diagnostic_3.csv
-rw-r--r-- 1 amas amas 149214 Nov  5 15:28 bernoulli-20251105152812-diagnostic_4.csv
-rw-r--r-- 1 amas amas    190 Nov  5 15:28 bernoulli-20251105152812-profile_1.csv
-rw-r--r-- 1 amas amas    189 Nov  5 15:28 bernoulli-20251105152812-profile_2.csv
-rw-r--r-- 1 amas amas    190 Nov  5 15:28 bernoulli-20251105152812-profile_3.csv
-rw-r--r-- 1 amas amas    190 Nov  5 15:28 bernoulli-20251105152812-profile_4.csv

For every file except the stdout files, we use the chain_id to index the file itself, so we get this mismatch where the stdout files are indexed 0-3 and the rest are indexed 1-4. This can result in some confusion where someone would thinking that the file suffixed by _1-stdout.txt corresponds to the output _1.csv, when it's actually 0-stdout.txt.

I think we should change this naming to align, will be an easy fix.

Second point I want to raise is that we have this inconsistency in how we name these files with indexes and extra bits. For stdout, we do {idx}-stdout.txt, so the index comes first, but for profile and diagnostic, we do (diagnostic|profile)_{idx}.csv, so the index comes second. I think we should probably align these so that our output files have consistent naming.

Welcome any thoughts from others -- if we have a consensus, I'll go ahead and implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions