Fds add num_partitions property to partitioners #3095
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
The number of partitions is a piece of information that is associated with a partitioner. However, there’s no way to access the number of partitions in the current partitioner abstractions. It makes it impossible to implement the creation of all partitions to use
concatenate_divisions
and create plots.Description
NaturalIdPartitioner
does not have the number of partitions (no need for that since it’s equal to the number of unique ids from the column specified by a user).Other partitioners have either
num_partitions
orpartition_sizes
. This makes the specification quite diverse.Related issues/PRs
To be created (
concatenate_divisions
and plotting). This is a prerequisite for these PRs.Proposal
Add an abstract property
num_partitions
to thePartitioner
. Expect users to trigger partitioning (+pior checks on the correctness) to ensure the correctness of this num_partitions.Explanation
This is the most flexible solution I see right now. It doesn't require additional attributes in each partitioner and, due to our lazy partitioning, enables it to be triggered manually.
The part of the correctness check has to happen prior to partitioning (and can't be done e.g. in init) because it's only possible when the dataset is assigned.
Changelog entry