-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hello,
Thank you for publishing the KALFormer paper and releasing code. After a careful review of the repository, I found several significant discrepancies between the paper's described methodology and the actual implementation, making reproduction of the reported results impossible with the current codebase:
-
No Train/Validation/Test Split (Section 4.1)
The paper states: "data were chronologically divided into training, validation, and test sets in an 8:1:1 ratio." However, no code in the repository implements any data splitting. All training scripts (kalformer.py, train_ett.py, train_electricity.py, etc.) create a single DataLoader over the entire dataset and train on all data. There is no validation loop and no test evaluation — only training metrics are reported. -
Hyperparameters Do Not Match Table 2
Table 2 specifies: encoder layers=4, hidden_dim=512, KAN_dim=128, dropout=0.1, batch_size=16, lr=1e-4, input_window=96. The code uses: layers=2, hidden_dim=64-256, KAN_dim=3-50, dropout=0.2-0.3, batch_size=32, lr=1e-3, window=10 or 192. -
Normalization Mismatch
The paper describes z-score normalization computed from the training set. The ETT, Traffic, and Weather dataloaders use min-max normalization (not z-score), and statistics are computed over the entire dataset (potential data leakage since there's no split). -
Missing Value Handling
The paper describes linear interpolation for missing values. The code uses dropna() (row deletion) instead. No interpolation code exists anywhere. -
Single-Step vs Multi-Horizon Prediction
The paper reports forecasting horizons of 96 and 192 steps. The code only predicts a single next time step (y = targets[idx + seq_length] — one scalar value). -
Unused Configuration File
configs/config.yaml exists with validation: enabled: false but is never loaded by any Python script (no import yaml exists). All hyperparameters are hardcoded. -
Random Knowledge Vectors
In the ablation experiment (kalformer.py line 33), the knowledge vector is torch.randn(...) — random noise rather than actual domain knowledge.
Could you please provide the actual code used to produce the results reported in Tables 3-5? The current repository appears to be an early development version rather than the final experimental code.
Thank you.