Skip to content

Commit

Permalink
Fix typo for alpha*, explain partition_id sorting
Browse files Browse the repository at this point in the history
  • Loading branch information
chongshenng committed Jul 18, 2024
1 parent e177c7a commit 1dd6e29
Showing 1 changed file with 5 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,13 @@ class DistributionPartitioner(Partitioner): # pylint: disable=R0902
( `num_unique_labels`, ---------------------------------------------------- ),
`num_unique_labels`
the label_id at the i'th row is assigned to the partition_id based on the formula:
partition_id = alpha + beta
partition_id = <alpha + beta>
where,
<.> denotes the reindexed sequence of partition_ids in monotone increasing
order for all j's
alpha* = (i - num_unique_labels_per_partition + 1) \
+ (j % num_unique_labels_per_partition)
alpha = alpha* + (alpha* > 0 ? 0 : num_unique_labels)
+ (j % num_unique_labels_per_partition),
alpha = alpha* + (alpha* >= 0 ? 0 : num_unique_labels),
beta = num_unique_labels * (j // num_unique_labels_per_partition)
and j in {0, 1, 2, ..., `num_columns`}. Each list representing the partition_ids for
the i'th row is sorted in ascending order. So, for a dataset with 10 unique labels
Expand Down

0 comments on commit 1dd6e29

Please sign in to comment.