How much is this problem specific to reinforcement learning? With imitation learning you are optimizing for human-likeness anyway, does this metric even make sense?
I think the problem is not specific to the method you are using to learn the policy. As for your second question, that is sort of what the post is about :).
Thank you for sharing! I see you also noted limitations such as varying upper bounds and sensor noise. Do you have any thoughts on how to improve benchmarks in this area? Or are you currently pursuing another project?
How much is this problem specific to reinforcement learning? With imitation learning you are optimizing for human-likeness anyway, does this metric even make sense?
I think the problem is not specific to the method you are using to learn the policy. As for your second question, that is sort of what the post is about :).
Thank you for sharing! I see you also noted limitations such as varying upper bounds and sensor noise. Do you have any thoughts on how to improve benchmarks in this area? Or are you currently pursuing another project?
Got it! Nice