Sampled MuZero / EfficientZero index out of bounds error in _compute_target_policy_non_reanalyzed

In MuZeroGameBuffer, I am getting index out of bounds errors in `_compute_target_policy_non_reanalyzed` when running Sampled EfficientZero on a custom environment. I think the line `policy_tmp[legal_action] = distributions[index]` may be wrong, because `policy_tmp` and `distributions` are the same shape, and legal_action is chosen from a very large / sparse list of actions, so it seems like the index should be used instead. Normally, policy_shape is set to `self._cfg.model.action_space_size`, but it instead is set to `self._cfg.model.num_of_sampled_actions` for Sampled EfficientZero.

One possible solution, I think, is to override _compute_target_policy_reanalyzed in **Sampled**MuZeroGameBuffer, then replace 
`policy_tmp[legal_action] = policy[index]`
with
`policy_tmp[index] = policy[index]`

I'm not sure about the following questions:
1) it makes sense to update only the Sampled variants here (rather than updating the base MuZeroGameBuffer)
2) not sure whether the "base" MuZero implementation is correct here either, or maybe the "bug" just doesn't matter because index == legal_action in all non-sampled environments, and
3) there is a "reanalyzed" and "non_reanalyzed" version of the function, and the fix might have to be applied to both of them.
4) maybe there is another solution, like adjusting policy_shape.

I think this bug didn't show up when I ran SampledEfficientZero on Connect4 because there are only 6 possible actions, which is less than the "K" setting, so I never got an index out of bounds error. But that is probably affecting the performance of the algorithm, if it is indeed a bug.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sampled MuZero / EfficientZero index out of bounds error in _compute_target_policy_non_reanalyzed #486

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Sampled MuZero / EfficientZero index out of bounds error in _compute_target_policy_non_reanalyzed #486

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions