Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1433] Add vertex retention to KeyedGrouping. #1470

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

p-f
Copy link
Collaborator

@p-f p-f commented Jan 20, 2020

Adds the feature to retain ungrouped vertices to KeyedGrouping.

Description

Adds the feature as well as tests.

Related Issue

#1433

Motivation and Context

This was supported in the older grouping implementation, this is now also supported here.

How Has This Been Tested?

Reuses some old tests and adds new tests.

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • I have updated the documentation accordingly (if necessary).
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • I ran a spell checker.

@p-f p-f requested a review from galpha January 20, 2020 15:30
@ChrizZz110 ChrizZz110 self-requested a review February 28, 2020 09:40
@ChrizZz110
Copy link
Contributor

Change year in headers from 2020 to 2021.

@timo95
Copy link
Contributor

timo95 commented Apr 1, 2021

I did a few small benchmarks on the cluster. I was comparing:

  • a: no retention
  • b: this implementation
  • c: alternative implementation that does not need any additional join, but delays the translation of edges to tuples until after the id update

Results:

  • a is faster than b and c
    • low parallelism (ldbc_10, 16 workers): b and c are ~200% slower
    • high parallelism (ldbc_10, 96 workers): b and c are ~10-40% slower
  • b is faster than c
    • low parallelism: about the same
    • high parallelism: c is ~30% slower

Based on that result, the additional join is no problem. In the next few days i will commit a few documentation and style improvements.

@ChrizZz110
Copy link
Contributor

@timo95 for further commits, please add the issue number as prefix of your commit message ;)

@timo95
Copy link
Contributor

timo95 commented Apr 8, 2021

I am finished anyway 😃. You can review now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants