Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip 2.7 custom comparator join #36

Open
wants to merge 2 commits into
base: 2.7
Choose a base branch
from

Conversation

Schmed
Copy link

@Schmed Schmed commented Jun 4, 2018

I'm able to run this outside the Cascading context, but I can't figure out how to get Gradle to run just this single unit test within cascading-platform (and running all cascading-platform unit tests would apparently take until nearly the end of time on my laptop).

@cwensel
Copy link
Member

cwensel commented Jun 5, 2018

I do recognize that the nature of CoGroup implies the actual values for lhs and rhs will remain stable and pass-through when the keys are not actually interchangeable. but this is the first time is has been brought up as an issue in 10 years.

the issue here is that MR, Tez, and local modes presume if the comparators state the values are equivalent, the instance values are then interchangeable (which isn't the expectation here).

custom comparators exist to allow for custom/third-party POJO values to be used as keys and/or allow for custom secondary sorting of values.

bucketing keys into ranges at this level hasn’t really been on the radar since values can be generated from functions allowing for them to be used as join keys.

it is optimal, performance/memory wise and code simplicity, to assume the keys are interchangeable.

the alternative case of keeping key instance values stable would require a pluggable re-write of the underlying CoGrouping mechanism for each platform.

that said, you might try using a BufferJoin + Buffer to implement the join. this should provide more insights on how the join is fabricated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants