Overly Optimistic / Incorrect Success Criteria in WidowX Tasks

I've been running my policy in the SimPLER environment (WidowX tasks), and the success metric feels really off.

The environment often reports success way too early

- e.g., the object is near the target but not actually placed correctly
- sometimes the robot hasn’t even finished the action (like releasing the object)

There are cases where the task is clearly about to fail (e.g., object could fall before completion), but it still gets counted as success
In some cases, the outcome is just plain wrong, but still marked as success

- object not in the right place
- visibly incorrect behavior

The following videos were marked as success.

https://github.com/user-attachments/assets/25e35821-ac0f-4464-bdbc-253327841c9e

https://github.com/user-attachments/assets/ceb55197-4551-4070-a096-e1f5b8815c58

The reported success rate looks much higher than it actually is. It makes it hard to properly evaluate and compare policies. It also becomes an issue if you're training policies using this environment, since the reward/signal is misleading.

It would really help if the success conditions were stricter, for example:

- ensuring the object is actually placed correctly (not just nearby)
- checking stability (not about to fall)
- avoiding triggering success before the action is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overly Optimistic / Incorrect Success Criteria in WidowX Tasks #129

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Overly Optimistic / Incorrect Success Criteria in WidowX Tasks #129

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions