Skip to content

Sampled MuZero / EfficientZero index out of bounds error in _compute_target_policy_non_reanalyzed #486

@esparano

Description

@esparano

In MuZeroGameBuffer, I am getting index out of bounds errors in _compute_target_policy_non_reanalyzed when running Sampled EfficientZero on a custom environment. I think the line policy_tmp[legal_action] = distributions[index] may be wrong, because policy_tmp and distributions are the same shape, and legal_action is chosen from a very large / sparse list of actions, so it seems like the index should be used instead. Normally, policy_shape is set to self._cfg.model.action_space_size, but it instead is set to self._cfg.model.num_of_sampled_actions for Sampled EfficientZero.

One possible solution, I think, is to override _compute_target_policy_reanalyzed in SampledMuZeroGameBuffer, then replace
policy_tmp[legal_action] = policy[index]
with
policy_tmp[index] = policy[index]

I'm not sure about the following questions:

  1. it makes sense to update only the Sampled variants here (rather than updating the base MuZeroGameBuffer)
  2. not sure whether the "base" MuZero implementation is correct here either, or maybe the "bug" just doesn't matter because index == legal_action in all non-sampled environments, and
  3. there is a "reanalyzed" and "non_reanalyzed" version of the function, and the fix might have to be applied to both of them.
  4. maybe there is another solution, like adjusting policy_shape.

I think this bug didn't show up when I ran SampledEfficientZero on Connect4 because there are only 6 possible actions, which is less than the "K" setting, so I never got an index out of bounds error. But that is probably affecting the performance of the algorithm, if it is indeed a bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions