Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When there duplicate records exist in table, the spark upsert can't merge record #1305

Open
1 task done
baiyangtx opened this issue Mar 30, 2023 · 3 comments
Open
1 task done
Labels
module:mixed-spark Spark module for Mixed Format type:bug Something isn't working

Comments

@baiyangtx
Copy link
Contributor

What happened?

If the table already has some records with duplicate primary key, then using spark insert into sql to do upsert will not work as expected. rows still are duplicated but expected to be merged into one.

Affects Versions

0.4.1

What engines are you seeing the problem on?

Spark

How to reproduce

image

Relevant log output

No response

Anything else

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@baiyangtx baiyangtx added type:bug Something isn't working module:mixed-spark Spark module for Mixed Format labels Mar 30, 2023
@wangtaohz
Copy link
Contributor

I can't reproduce it in 0.4.1.

CREATE TABLE IF NOT EXISTS user (
    id INT,
    name string,
    ts TIMESTAMP,
    PRIMARY KEY(id)
) USING arctic 
PARTITIONED BY (days(ts));

insert overwrite db.user values (2, "frank", timestamp("2022-07-02 09:11:00"));
...
insert into db.user values (2, "frankkkk", timestamp("2022-07-02 09:11:00"));
insert into db.user values (2, "frankkkk", timestamp("2022-07-02 09:11:00"));

alter table db.user set tblproperties (
    'write.upsert.enabled' = 'true');

insert into db.user values (2, "llllll", timestamp("2022-07-02 09:11:00"));

result

image

image

@Aireed Aireed added this to the Release 0.5.0 milestone Apr 13, 2023
@baiyangtx baiyangtx removed this from the Release 0.5.0 milestone Apr 13, 2023
Copy link

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Aug 20, 2024
@zhoujinsong
Copy link
Contributor

Any progress on this issue? @baiyangtx

@github-actions github-actions bot removed the stale label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixed-spark Spark module for Mixed Format type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants