4 Comments
User's avatar
Tambet Matiisen's avatar

How much is this problem specific to reinforcement learning? With imitation learning you are optimizing for human-likeness anyway, does this metric even make sense?

Daphne Cornelisse's avatar

I think the problem is not specific to the method you are using to learn the policy. As for your second question, that is sort of what the post is about :).

User's avatar
Comment deleted
Feb 8
Comment deleted
Daphne Cornelisse's avatar

Thank you for sharing! I see you also noted limitations such as varying upper bounds and sensor noise. Do you have any thoughts on how to improve benchmarks in this area? Or are you currently pursuing another project?

User's avatar
Comment deleted
Feb 8
Comment deleted