Add experiment config #18

gonzalobenegas · 2025-12-18T21:31:18Z

No description provided.

claude · 2025-12-18T21:33:09Z

Code Review - PR #18: Add experiment config

I've reviewed this PR against the CLAUDE.md guidelines. Here's my feedback:

🔴 Critical Issues

1. Missing `soft_masked_weight` Parameter

Location: Both experiment configs
Issue: The configs don't specify soft_masked_weight in the data section, but the model config (clm_transformer_base.yaml:3) references ${data.soft_masked_weight}.

Impact: This will cause a Hydra interpolation error at runtime since the key is missing.

Fix: Add to both configs:

data:
  _target_: glm_experiments.data.lm_datamodule.CLMDataModule
  dataset_name: data/gonzalobenegas/genomes-v2-genome_set-animals-intervals-v1_512_256
  per_device_batch_size: 256
  data_augmentation: false
  soft_masked_weight: 0.01  # Required - referenced by model config

2. Missing `evals` Parameter

Location: Both experiment configs
Issue: Similarly, the model config references ${data.evals} (line 4) but it's not defined.

Fix: Add to both configs:

data:
  evals: null  # Or specify eval datasets if needed

⚠️ Medium Priority Issues

3. Duplicate WandB Name

Location: clm_transformer_base_new_dataset_vertebrates.yaml:12
Issue: Both configs use the same wandb name experiment-clm-transformer-base-new-dataset, making it hard to distinguish runs.

Fix:

logger:
  wandb:
    name: experiment-clm-transformer-base-new-dataset-vertebrates  # Add '-vertebrates'

4. Incorrect Documentation Comment

Location: clm_transformer_base_new_dataset_vertebrates.yaml:4
Issue: Comment says experiment=clm_transformer_base_new_dataset but should be experiment=clm_transformer_base_new_dataset_vertebrates.

Fix:

# to execute this experiment run:
# python glm_experiments/train.py experiment=clm_transformer_base_new_dataset_vertebrates

📝 Recommendations

5. Missing Dataset Validation

Concern: The README.md is updated to include download instructions for the new dataset, but there's no verification that:

The dataset exists at the specified HuggingFace path
The dataset has the expected structure (columns, format)
The dataset is compatible with CLMDataModule

Suggestion: Add a simple test or at least manually verify the dataset can be loaded.

6. Inconsistent Pattern with Base Config

Observation: clm_transformer_base.yaml doesn't explicitly override data_augmentation: false, but these new configs do. This is fine, but worth noting for consistency.

✅ Positive Observations

Proper use of Hydra _target_ instantiation pattern
Correct scheduler configuration with interpolation (${trainer.max_steps})
Appropriate tags for experiment tracking
Follows existing experiment config structure

🧪 Testing Recommendation

Since this PR was merged, I recommend:

Running both experiment configs to verify they don't crash on startup
Checking that the Hydra interpolations resolve correctly
Verifying the datasets can be loaded

Command to test:

# Dry-run to check config resolution
uv run python glm_experiments/train.py experiment=clm_transformer_base_new_dataset --cfg job

# Check vertebrates config
uv run python glm_experiments/train.py experiment=clm_transformer_base_new_dataset_vertebrates --cfg job

Summary

Severity: Medium-High - The missing required parameters will cause runtime failures.

Action Required: Either:

Create a follow-up PR to fix the missing parameters, OR
Verify that subsequent commits already addressed these issues

Review conducted following CLAUDE.md standards for configuration management and Hydra patterns.

Add experiment config

c07d6c3

gonzalobenegas merged commit ea43515 into main Dec 18, 2025
6 checks passed

gonzalobenegas deleted the new-dataset branch December 18, 2025 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add experiment config #18

Add experiment config #18

Uh oh!

gonzalobenegas commented Dec 18, 2025

Uh oh!

Uh oh!

claude bot commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add experiment config #18

Add experiment config #18

Uh oh!

Conversation

gonzalobenegas commented Dec 18, 2025

Uh oh!

Uh oh!

claude bot commented Dec 18, 2025

Code Review - PR #18: Add experiment config

🔴 Critical Issues

1. Missing soft_masked_weight Parameter

2. Missing evals Parameter

⚠️ Medium Priority Issues

3. Duplicate WandB Name

4. Incorrect Documentation Comment

📝 Recommendations

5. Missing Dataset Validation

6. Inconsistent Pattern with Base Config

✅ Positive Observations

🧪 Testing Recommendation

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Missing `soft_masked_weight` Parameter

2. Missing `evals` Parameter