Skip to content

roc bug, possibly due to dimension names? #442

@paololucchino

Description

@paololucchino

Bug: Incorrect ROC AUC calculation, possibly related to dimension order in multi-dimensional arrays

When computing ROC AUC on multi-dimensional arrays where observations and forecasts have different dimension orders (but same dimension names), xss.roc() produces incorrect results that differ significantly from sklearn's ground truth implementation.

Code Sample, a copy-pastable example if possible

import numpy as np
import xarray as xr
import xskillscore as xss
from sklearn.metrics import roc_auc_score

# Create test data with specific seed
np.random.seed(1512)
obs_raw = xr.DataArray(
    np.random.normal(0.5, 0.2, size=(20, 10)),
    coords=[("time", np.arange(20)), ("points", np.arange(10))],
)
da_obs = (obs_raw > 0.5).astype(int)

# Create forecast with different dimension order via broadcasting
alpha = xr.DataArray(np.linspace(0, 1, num=10), coords=[("points", np.arange(10))])
error = xr.DataArray(np.random.normal(0.0, 0.03, size=20), coords=[("time", np.arange(20))])
da_forecast = alpha + obs_raw + error

print(f"da_obs dims: {da_obs.dims}, shape: {da_obs.shape}")
print(f"da_forecast dims: {da_forecast.dims}, shape: {da_forecast.shape}")
# Output: da_obs dims: ('time', 'points'), da_forecast dims: ('points', 'time')

# Compute using xskillscore
xss_result = xss.roc(da_obs, da_forecast, dim="time", return_results="area")

# Compare against sklearn (ground truth) for each point
print("\nComparison with sklearn:")
print(f"{'Point':<6} {'sklearn':<10} {'xskillscore':<12} {'Error':<10}")
print("-" * 40)

for point in range(10):
    obs_p = da_obs.isel(points=point).values
    fc_p = da_forecast.isel(points=point).values

    sklearn_auc = roc_auc_score(obs_p, fc_p)
    xss_auc = xss_result.isel(points=point).values
    error = abs(sklearn_auc - xss_auc)

    print(f"{point:<6} {sklearn_auc:<10.6f} {xss_auc:<12.6f} {error:<10.6f}")

Output:

da_obs dims: ('time', 'points'), shape: (20, 10)
da_forecast dims: ('points', 'time'), shape: (10, 20)

Comparison with sklearn:

Comparison with sklearn:
Point  sklearn    xskillscore  Error
----------------------------------------
0      0.939394   0.939394     0.000000
1      0.979798   0.979798     0.000000
2      1.000000   0.990909     0.009091
3      1.000000   1.000000     0.000000
4      0.958333   0.845833     0.112500
5      0.958333   0.733333     0.225000
6      0.927083   0.200000     0.727083
7      1.000000   0.140909     0.859091
8      0.968750   0.175000     0.793750
9      1.000000   0.000000     1.000000

For point 9 specifically:

  • The data shows strong positive correlation (0.869 between obs and forecasts)
  • High forecast values consistently correspond to positive observations
  • sklearn correctly returns AUC = 1.0 (perfect classifier)
  • xskillscore incorrectly returns AUC = 0.0 (completely wrong)

*** Expected Output ***

xskillscore should produce results matching sklearn regardless of dimension order, as long as dimension names match.

Environment:

  • xskillscore version: 0.0.26
  • xarray version: 2025.3.1
  • numpy version: 1.26.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions