-
Notifications
You must be signed in to change notification settings - Fork 8
Add dclm-core-22 to jupiter #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,7 +31,7 @@ jupiter: | |
| ACCOUNT: "jureap59" | ||
| QUEUE_LIMIT: 250 | ||
| EVAL_CONTAINER_IMAGE: "eval_env-jupiter.sif" | ||
| SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1" | ||
| SINGULARITY_ARGS: "--nv --contain --env PYTHONNOUSERSITE=1 --env SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You'll probably also have to integrate this into the container downloading logic in |
||
|
|
||
| lumi: | ||
| hostname_pattern: "uan*" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,6 +10,17 @@ task_metrics: | |
| commonsense_qa: acc | ||
| hellaswag: acc_norm | ||
| piqa: acc_norm | ||
| social_iqa: acc | ||
| agieval_lsat_ar: acc | ||
| wsc273: acc | ||
| bigbench_language_identification_multiple_choice: acc | ||
| squadv2: f1 | ||
| coqa: f1 | ||
| bigbench_qa_wikidata_generate_until: exact_match | ||
| bigbench_dyck_languages_generate_until: exact_match | ||
| bigbench_operators_generate_until: exact_match | ||
| bigbench_repeat_copy_logic_generate_until: exact_match | ||
| bigbench_cs_algorithms_generate_until: exact_match | ||
|
|
||
| task_groups: | ||
| open-sci-0.01: | ||
|
|
@@ -293,6 +304,87 @@ task_groups: | |
| - task: include_base_44_ukrainian | ||
| subset: Ukrainian | ||
|
|
||
| dclm-core-22: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you add the dataset field to either the task group or tasks? otherwise dataset pre-downloading (before job submission) will fail/not be available and then the jobs will fail unless you have internet access on the compute nodes. You can check |
||
| description: "DCLM core 22 evaluation tasks (lm-eval-harness, matching LLM Foundry task types)" | ||
| suite: lm-eval-harness | ||
| tasks: | ||
| - task: agieval_lsat_ar | ||
| n_shots: [3] | ||
| dataset: hails/agieval-lsat-ar | ||
| - task: arc_easy | ||
| n_shots: [10] | ||
| dataset: allenai/ai2_arc | ||
| subset: ARC-Easy | ||
| - task: arc_challenge | ||
| n_shots: [10] | ||
| dataset: allenai/ai2_arc | ||
| subset: ARC-Challenge | ||
| - task: boolq | ||
| n_shots: [10] | ||
| dataset: aps/super_glue | ||
| subset: boolq | ||
| - task: commonsense_qa | ||
| n_shots: [10] | ||
| dataset: tau/commonsense_qa | ||
| - task: copa | ||
| n_shots: [0] | ||
| dataset: aps/super_glue | ||
| subset: copa | ||
| - task: hellaswag | ||
| n_shots: [0, 10] | ||
| dataset: Rowan/hellaswag | ||
| - task: openbookqa | ||
| n_shots: [0] | ||
| dataset: allenai/openbookqa | ||
| subset: main | ||
| - task: piqa | ||
| n_shots: [10] | ||
| dataset: baber/piqa | ||
| - task: bigbench_language_identification_multiple_choice | ||
| n_shots: [10] | ||
| dataset: hails/bigbench | ||
| subset: language_identification_zero_shot | ||
| - task: winogrande | ||
| n_shots: [0] | ||
| dataset: allenai/winogrande | ||
| subset: winogrande_xl | ||
| - task: wsc273 | ||
| n_shots: [0] | ||
| dataset: winograd_wsc | ||
| - task: lambada_openai | ||
| n_shots: [0] | ||
| dataset: EleutherAI/lambada_openai | ||
| - task: bigbench_qa_wikidata_generate_until | ||
| n_shots: [10] | ||
| dataset: hails/bigbench | ||
| subset: qa_wikidata_zero_shot | ||
| - task: bigbench_dyck_languages_generate_until | ||
| n_shots: [10] | ||
| dataset: hails/bigbench | ||
| subset: dyck_languages_zero_shot | ||
| - task: bigbench_operators_generate_until | ||
| n_shots: [10] | ||
| dataset: hails/bigbench | ||
| subset: operators_zero_shot | ||
| - task: bigbench_repeat_copy_logic_generate_until | ||
| n_shots: [10] | ||
| dataset: hails/bigbench | ||
| subset: repeat_copy_logic_zero_shot | ||
| - task: bigbench_cs_algorithms_generate_until | ||
| n_shots: [10] | ||
| dataset: hails/bigbench | ||
| subset: cs_algorithms_zero_shot | ||
| - task: coqa | ||
| n_shots: [0] | ||
| dataset: EleutherAI/coqa | ||
| - task: squadv2 | ||
| n_shots: [10] | ||
| dataset: rajpurkar/squad_v2 | ||
| # TODO: jeopardy is not available in lm-eval-harness. | ||
| # - task: jeopardy | ||
| # n_shots: [10] | ||
| # dataset: openaccess-ai-collective/jeopardy | ||
|
|
||
| super_groups: | ||
| oellm-multilingual: | ||
| description: "Combined Belebele EU set plus multilingual benchmarks" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # Dependencies for DCLM-core-22 evaluation (install in venv) | ||
| # Install with: uv pip install -r requirements-venv-dclm.txt | ||
| lm-eval==0.4.9.2 | ||
| torch | ||
| transformers>=4.43.2,<5.0.0 | ||
| accelerate | ||
| datasets<4.0.0 | ||
| wandb | ||
| sentencepiece | ||
| tiktoken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt it missing datasets and some other dependencies?
If so lets merge this PR to add DCLM support and update the jupiter container in a follow-up since the PR has been open for a long time.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geoalgo I found that lm-eval installs these already. Also
nltkisn't required now as it was used with lighteval earlier.