Skip to content

Overly Optimistic / Incorrect Success Criteria in WidowX Tasks #129

@abhijith183

Description

@abhijith183

I've been running my policy in the SimPLER environment (WidowX tasks), and the success metric feels really off.

The environment often reports success way too early

  • e.g., the object is near the target but not actually placed correctly
  • sometimes the robot hasn’t even finished the action (like releasing the object)

There are cases where the task is clearly about to fail (e.g., object could fall before completion), but it still gets counted as success
In some cases, the outcome is just plain wrong, but still marked as success

  • object not in the right place
  • visibly incorrect behavior

The following videos were marked as success.

2026_03_20-11_36_15--episode.37--success.True--task.put_the_spoon_on_the_towel.mp4
2026_03_20-12_22_40--episode.17--success.True--task.put_carrot_on_plate.mp4

The reported success rate looks much higher than it actually is. It makes it hard to properly evaluate and compare policies. It also becomes an issue if you're training policies using this environment, since the reward/signal is misleading.

It would really help if the success conditions were stricter, for example:

  • ensuring the object is actually placed correctly (not just nearby)
  • checking stability (not about to fall)
  • avoiding triggering success before the action is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions