Skip to content

fix: Pre-populate AppWorld rewards_dict with default reward/success value#9

Open
shady-cs15 wants to merge 1 commit intoApGa:mainfrom
shady-cs15:sat/aw_b200
Open

fix: Pre-populate AppWorld rewards_dict with default reward/success value#9
shady-cs15 wants to merge 1 commit intoApGa:mainfrom
shady-cs15:sat/aw_b200

Conversation

@shady-cs15
Copy link

@shady-cs15 shady-cs15 commented Feb 25, 2026

  1. Added a script for B200 configs for appworld
    Fixes include
    a. using flash infer instead of flash attention.
    b. dropping static memory to 0.6, this gives 40 % of gpu memory to kv cache and other parts that require dynamic memory allocation.
    c. rollout timeout increased to 2 hrs (some rollouts till failing)
    d. step timeout increased to 30 mins

  2. Makes a small fix where rewards_dict is pre-populated with a reward/success with default value of 0

  • otherwise when some rollouts don't finish and some do, these values cannot be aggregated across a batch, throwing run time errors

@ApGa ApGa changed the title Add script for B200, Fixes small bug fix: Pre-populate AppWorld rewards_dict with default reward/success value Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant