Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Duplicate Detection #30

Open
luccalb opened this issue Nov 5, 2024 · 0 comments
Open

Advanced Duplicate Detection #30

luccalb opened this issue Nov 5, 2024 · 0 comments
Labels
component Related to a RTDIP component

Comments

@luccalb
Copy link

luccalb commented Nov 5, 2024

User Story

  1. As a RTDIP user
  2. I want to detect duplicate data points for different kinds of data
  3. So that I can achieve a high data quality

Additional context

Acceptance Criteria

  • The existing duplicate detection component should be extended
  • It should now accept a list of columns in a dataframe that serve as primary key
  • The duplicate detection should then make sure that no primary key appears twice in the data

Definition of Done

  • Test cases have been created and are running successfully
  • Documentation for the new component was added
  • Github Actions are running without errors
@luccalb luccalb changed the title Advanced Duplicate Detection [Component] Advanced Duplicate Detection Nov 5, 2024
@luccalb luccalb added the component Related to a RTDIP component label Nov 5, 2024
@luccalb luccalb changed the title [Component] Advanced Duplicate Detection Advanced Duplicate Detection Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component Related to a RTDIP component
Projects
Status: Product Backlog
Development

No branches or pull requests

1 participant