-
Notifications
You must be signed in to change notification settings - Fork 183
Overly Optimistic / Incorrect Success Criteria in WidowX Tasks #129
Copy link
Copy link
Open
Description
I've been running my policy in the SimPLER environment (WidowX tasks), and the success metric feels really off.
The environment often reports success way too early
- e.g., the object is near the target but not actually placed correctly
- sometimes the robot hasn’t even finished the action (like releasing the object)
There are cases where the task is clearly about to fail (e.g., object could fall before completion), but it still gets counted as success
In some cases, the outcome is just plain wrong, but still marked as success
- object not in the right place
- visibly incorrect behavior
The following videos were marked as success.
2026_03_20-11_36_15--episode.37--success.True--task.put_the_spoon_on_the_towel.mp4
2026_03_20-12_22_40--episode.17--success.True--task.put_carrot_on_plate.mp4
The reported success rate looks much higher than it actually is. It makes it hard to properly evaluate and compare policies. It also becomes an issue if you're training policies using this environment, since the reward/signal is misleading.
It would really help if the success conditions were stricter, for example:
- ensuring the object is actually placed correctly (not just nearby)
- checking stability (not about to fall)
- avoiding triggering success before the action is complete
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels