Reproducibility: Code does not match paper's experimental methodology (Table 2, data splits, normalization)

Hello,

Thank you for publishing the KALFormer paper and releasing code. After a careful review of the repository, I found several significant discrepancies between the paper's described methodology and the actual implementation, making reproduction of the reported results impossible with the current codebase:

1. No Train/Validation/Test Split (Section 4.1)
The paper states: "data were chronologically divided into training, validation, and test sets in an 8:1:1 ratio." However, no code in the repository implements any data splitting. All training scripts (kalformer.py, train_ett.py, train_electricity.py, etc.) create a single DataLoader over the entire dataset and train on all data. There is no validation loop and no test evaluation — only training metrics are reported.

2. Hyperparameters Do Not Match Table 2
Table 2 specifies: encoder layers=4, hidden_dim=512, KAN_dim=128, dropout=0.1, batch_size=16, lr=1e-4, input_window=96. The code uses: layers=2, hidden_dim=64-256, KAN_dim=3-50, dropout=0.2-0.3, batch_size=32, lr=1e-3, window=10 or 192.

3. Normalization Mismatch
The paper describes z-score normalization computed from the training set. The ETT, Traffic, and Weather dataloaders use min-max normalization (not z-score), and statistics are computed over the entire dataset (potential data leakage since there's no split).

4. Missing Value Handling
The paper describes linear interpolation for missing values. The code uses dropna() (row deletion) instead. No interpolation code exists anywhere.

5. Single-Step vs Multi-Horizon Prediction
The paper reports forecasting horizons of 96 and 192 steps. The code only predicts a single next time step (y = targets[idx + seq_length] — one scalar value).

6. Unused Configuration File
configs/config.yaml exists with validation: enabled: false but is never loaded by any Python script (no import yaml exists). All hyperparameters are hardcoded.

7. Random Knowledge Vectors
In the ablation experiment (kalformer.py line 33), the knowledge vector is torch.randn(...) — random noise rather than actual domain knowledge.

Could you please provide the actual code used to produce the results reported in Tables 3-5? The current repository appears to be an early development version rather than the final experimental code.

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility: Code does not match paper's experimental methodology (Table 2, data splits, normalization) #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducibility: Code does not match paper's experimental methodology (Table 2, data splits, normalization) #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions