Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precise logics for n_actions in Dataset and Simulator #45

Open
yoongi0428 opened this issue Jan 14, 2021 · 0 comments
Open

More precise logics for n_actions in Dataset and Simulator #45

yoongi0428 opened this issue Jan 14, 2021 · 0 comments

Comments

@yoongi0428
Copy link

yoongi0428 commented Jan 14, 2021

Possible Issue

In bandit feedback, n_actions are set as int(self.action.max() + 1), which doesn't raise any error in above code,
assuming that logs generated by policy covered all possible actions.

However, to be more precise, I think n_actions should be explicitly given, rather than extracted from log data.
And if changed, the above code might raise error.
If 1000 possible actions and only 0~998 actions exist in bandit _feedback and somehow policy selected action 999,
this might raise out-of-index error.

Idea

  1. BanditFeedback data is given n_actions explicitly.
    Rather than:

    @property
    def n_actions(self) -> int:
    """Number of actions."""
    return int(self.action.max() + 1)

  2. Use n_actions directly in convert_to_action_dist
    Rather than:

    action_dist = convert_to_action_dist(
    n_actions=bandit_feedback["action"].max() + 1,
    selected_actions=np.array(selected_actions_list),
    )

@yoongi0428 yoongi0428 changed the title Possible out of index error in simulator More precise logics for n_actions in Dataset and Simulator Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant