Conversation
|
Thanks for the implementations! |
61e6cea to
2ab0780
Compare
…e comparison trainer
…ly through sacred
e2eef3a to
62157e6
Compare
AdamGleave
left a comment
There was a problem hiding this comment.
Initial review, I haven't looked at the tests yet.
| @@ -0,0 +1 @@ | |||
| """PEBBLE specific algorithms.""" | |||
There was a problem hiding this comment.
It feels a bit odd that we have preference_comparisons.py in a single file but PEBBLE (much smaller) split across several files. That's probably a sign we should split up preference_comparisons.py not aggregate PEBBLE though.
There was a problem hiding this comment.
We can do that, e.g., classes for work with fragments and preference gathering seem like independent pieces of logic. Probably for another PR, though.
Codecov Report
@@ Coverage Diff @@
## master #625 +/- ##
==========================================
+ Coverage 97.51% 97.62% +0.10%
==========================================
Files 85 88 +3
Lines 8316 8698 +382
==========================================
+ Hits 8109 8491 +382
Misses 207 207
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Co-authored-by: Adam Gleave <adam@gleave.me>
…is no pretraining
Co-authored-by: Adam Gleave <adam@gleave.me>
…ardNets can be injected from the outside
|
@AdamGleave: reacting to your comments here together:
Ok, it required a larger refactor, but you can see how it looks in the last couple of commits. A good thing is that this change also addresses your other comment. It simplified the entropy reward classes (separate entropy reward and switching from pre-traininig reward) and allows for more configurability, at the expense of making wiring a little more complicated (in train_preference_comparison.py). It also results in two changes internally:
|
Description
Creates an entropy reward replay wrapper to support the unsupervised state entropy based pre-training of an agent, as described in the PEBBLE paper.
https://sites.google.com/view/icml21pebble
Testing
Added unit tests.