Skip to content

neg_binomial_2_rng can obstruct loo_moment_match() #274

@kthayashi

Description

@kthayashi

I'm working on a Stan program that includes a generated quantities block that generates log_lik for use with loo and yrep for posterior predictive checks. This looks something like (with some parts abbreviated as ...):

generated quantities {
  vector[n] log_lik;
  array[n] int yrep;
  ...
  for (i in 1:n) {
    ...
    log_lik[i] = neg_binomial_2_lpmf(y[i] | mu[i], phi[i]);
  }
  yrep = neg_binomial_2_rng(mu, phi);
}

I'm using cmdstanr to fit the model and run loo with moment-matching:

mod <- cmdstanr::cmdstan_model(...)
fit <- mod$sample(...)
fit_loo <- fit$loo(moment_match = TRUE, cores = 4)

During the operation of loo_moment_match(), I sometimes get a couple error/exception messages that appear to stem from overflow in the *_rng function in the generated quantities block. These all look like:

Error : Exception: neg_binomial_2_rng: Random number that came from gamma distribution is 1.47285e+09, but must be less than 1073741824.000000 (in '/var/folders/2k/c0vy7xwj4kb9x7hbgtpq5m640000gn/T//RtmpE8xV8L/model-6c3130f99d6e.stan', line 83, column 4 to column 39)

Further, these messages are sometimes (but not usually) followed by an error that causes loo_moment_match() to fail:

Error in mm_list[[ii]]$i : $ operator is invalid for atomic vectors
In addition: Warning message:
In parallel::mclapply(X = I, mc.cores = cores, FUN = function(i) loo_moment_match_i_fun(i)) :
  scheduled cores 4, 1, 3 encountered errors in user code, all values of the jobs will be affected

To the best of my understanding, this appears to happen because loo_moment_match_i_fun() is failing for one or more cases. Perhaps mm_list[[ii]] is NA?

loo/R/loo_moment_matching.R

Lines 130 to 131 in 6e7001e

mm_list <- parallel::mclapply(X = I, mc.cores = cores,
FUN = function(i) loo_moment_match_i_fun(i))

loo/R/loo_moment_matching.R

Lines 142 to 143 in 6e7001e

for (ii in seq_along(I)) {
i <- mm_list[[ii]]$i

I get a small number (~1-3) of the error/exception messages pretty consistently, but the error that causes loo_moment_match() to fail is less common. One place that I've been able to produce this error consistently is within a targets pipeline, which suggests to me that this is something that can be influenced by the RNG state. When I did get this error, it was preceded by ~10 of those error/exception messages. I can confirm that this error can also be produced without targets or callr, just less consistently. I'm using cores = 4 here, but the error can still occur with cores = 1. Commenting out code for yrep and *_rng in the Stan file eliminates the issue entirely, but it is (very so slightly) inconvenient to have to make this change depending on whether I want to use loo_moment_match() with the fitted model. I haven't encountered this problem when the *_rng function is something that is less likely to overflow than the negative binomial.

I wanted to report this issue here since it seems to have something to do with loo_moment_match(). It feels like it could be something related to or not entirely covered by #262. If this is expected behavior, I would appreciate any tips on how to better deal with having both log_lik and yrep in the generated quantities block when it comes to using loo_moment_match(). I'm sorry if any of this is off base, as I do not have a good understanding of the inner workings of the moment-matching code.

Some system info:

> packageVersion("loo")
[1] ‘2.8.0.9000’
> packageVersion("cmdstanr")
[1] ‘0.8.1’
> cmdstanr::cmdstan_version()
[1] "2.35.0"
> R.version
               _                           
platform       aarch64-apple-darwin20      
arch           aarch64                     
os             darwin20                    
system         aarch64, darwin20           
status                                     
major          4                           
minor          4.1                         
year           2024                        
month          06                          
day            14                          
svn rev        86737                       
language       R                           
version.string R version 4.4.1 (2024-06-14)
nickname       Race for Your Life          

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions