Skip to content

Reproducibility: Code does not match paper's experimental methodology (Table 2, data splits, normalization) #1

@andermannfelix-max

Description

@andermannfelix-max

Hello,

Thank you for publishing the KALFormer paper and releasing code. After a careful review of the repository, I found several significant discrepancies between the paper's described methodology and the actual implementation, making reproduction of the reported results impossible with the current codebase:

  1. No Train/Validation/Test Split (Section 4.1)
    The paper states: "data were chronologically divided into training, validation, and test sets in an 8:1:1 ratio." However, no code in the repository implements any data splitting. All training scripts (kalformer.py, train_ett.py, train_electricity.py, etc.) create a single DataLoader over the entire dataset and train on all data. There is no validation loop and no test evaluation — only training metrics are reported.

  2. Hyperparameters Do Not Match Table 2
    Table 2 specifies: encoder layers=4, hidden_dim=512, KAN_dim=128, dropout=0.1, batch_size=16, lr=1e-4, input_window=96. The code uses: layers=2, hidden_dim=64-256, KAN_dim=3-50, dropout=0.2-0.3, batch_size=32, lr=1e-3, window=10 or 192.

  3. Normalization Mismatch
    The paper describes z-score normalization computed from the training set. The ETT, Traffic, and Weather dataloaders use min-max normalization (not z-score), and statistics are computed over the entire dataset (potential data leakage since there's no split).

  4. Missing Value Handling
    The paper describes linear interpolation for missing values. The code uses dropna() (row deletion) instead. No interpolation code exists anywhere.

  5. Single-Step vs Multi-Horizon Prediction
    The paper reports forecasting horizons of 96 and 192 steps. The code only predicts a single next time step (y = targets[idx + seq_length] — one scalar value).

  6. Unused Configuration File
    configs/config.yaml exists with validation: enabled: false but is never loaded by any Python script (no import yaml exists). All hyperparameters are hardcoded.

  7. Random Knowledge Vectors
    In the ablation experiment (kalformer.py line 33), the knowledge vector is torch.randn(...) — random noise rather than actual domain knowledge.

Could you please provide the actual code used to produce the results reported in Tables 3-5? The current repository appears to be an early development version rather than the final experimental code.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions