-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
Bug: Incorrect ROC AUC calculation, possibly related to dimension order in multi-dimensional arrays
When computing ROC AUC on multi-dimensional arrays where observations and forecasts have different dimension orders (but same dimension names), xss.roc() produces incorrect results that differ significantly from sklearn's ground truth implementation.
Code Sample, a copy-pastable example if possible
import numpy as np
import xarray as xr
import xskillscore as xss
from sklearn.metrics import roc_auc_score
# Create test data with specific seed
np.random.seed(1512)
obs_raw = xr.DataArray(
np.random.normal(0.5, 0.2, size=(20, 10)),
coords=[("time", np.arange(20)), ("points", np.arange(10))],
)
da_obs = (obs_raw > 0.5).astype(int)
# Create forecast with different dimension order via broadcasting
alpha = xr.DataArray(np.linspace(0, 1, num=10), coords=[("points", np.arange(10))])
error = xr.DataArray(np.random.normal(0.0, 0.03, size=20), coords=[("time", np.arange(20))])
da_forecast = alpha + obs_raw + error
print(f"da_obs dims: {da_obs.dims}, shape: {da_obs.shape}")
print(f"da_forecast dims: {da_forecast.dims}, shape: {da_forecast.shape}")
# Output: da_obs dims: ('time', 'points'), da_forecast dims: ('points', 'time')
# Compute using xskillscore
xss_result = xss.roc(da_obs, da_forecast, dim="time", return_results="area")
# Compare against sklearn (ground truth) for each point
print("\nComparison with sklearn:")
print(f"{'Point':<6} {'sklearn':<10} {'xskillscore':<12} {'Error':<10}")
print("-" * 40)
for point in range(10):
obs_p = da_obs.isel(points=point).values
fc_p = da_forecast.isel(points=point).values
sklearn_auc = roc_auc_score(obs_p, fc_p)
xss_auc = xss_result.isel(points=point).values
error = abs(sklearn_auc - xss_auc)
print(f"{point:<6} {sklearn_auc:<10.6f} {xss_auc:<12.6f} {error:<10.6f}")Output:
da_obs dims: ('time', 'points'), shape: (20, 10)
da_forecast dims: ('points', 'time'), shape: (10, 20)
Comparison with sklearn:
Comparison with sklearn:
Point sklearn xskillscore Error
----------------------------------------
0 0.939394 0.939394 0.000000
1 0.979798 0.979798 0.000000
2 1.000000 0.990909 0.009091
3 1.000000 1.000000 0.000000
4 0.958333 0.845833 0.112500
5 0.958333 0.733333 0.225000
6 0.927083 0.200000 0.727083
7 1.000000 0.140909 0.859091
8 0.968750 0.175000 0.793750
9 1.000000 0.000000 1.000000
For point 9 specifically:
- The data shows strong positive correlation (0.869 between obs and forecasts)
- High forecast values consistently correspond to positive observations
- sklearn correctly returns AUC = 1.0 (perfect classifier)
- xskillscore incorrectly returns AUC = 0.0 (completely wrong)
*** Expected Output ***
xskillscore should produce results matching sklearn regardless of dimension order, as long as dimension names match.
Environment:
- xskillscore version: 0.0.26
- xarray version: 2025.3.1
- numpy version: 1.26.4
Metadata
Metadata
Assignees
Labels
No labels