Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insert_all/upsert_implementation using MERGE #869

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mgrunberg
Copy link
Contributor

This PR resolves the issue #859. Since it adds support to insert_on_duplicate_skip it also resolves #847.

The PR is in early stages. I'm exploring adding supports_insert_on_duplicate_skip first (insert_all). Upsert_all is still pending.

At this stage this test is falling with ActiveRecord::RecordNotUnique: TinyTds::Error: Cannot insert duplicate key row in object 'dbo.books' with unique index 'index_books_on_author_id_and_name'. The duplicate key value is (8, Refactoring).

def test_insert_all_with_skip_duplicates_and_autonumber_id_given
  skip unless supports_insert_on_duplicate_skip?

  assert_difference "Book.count", 1 do
    Book.insert_all [
      { id: 200, author_id: 8, name: "Refactoring" },
      { id: 201, author_id: 8, name: "Refactoring" }
    ]
  end
end

The test produces the following query

SET IDENTITY_INSERT [books] ON;
MERGE INTO [books] WITH (UPDLOCK, HOLDLOCK) AS target 
USING (SELECT DISTINCT * FROM (VALUES (200, 8, N'Refactoring'), (201, 8, N'Refactoring')) AS t1 ([id],[author_id],[name])) AS source 
ON (target.[author_id] = source.[author_id] AND target.[name] = source.[name]) OR (target.[id] = source.[id]) 
WHEN NOT MATCHED BY TARGET THEN 
  INSERT ([id],[author_id],[name]) VALUES (source.[id], source.[author_id], source.[name]) 
OUTPUT INSERTED.[id];
SET IDENTITY_INSERT [books] OFF;

SQL Server computes the source and target join and then applies the conditions to decide if a record from the joined table matches or not. In this case, both records are inserted.

I'm having doubts if implementing insert_all using merge is possible.

upsert_all seems more challenging. Besides this problem, WHEN MATCHED can only update a row once.

For reference, table schema is

create_table :books, id: :integer, force: true do |t|
    default_zero = { default: 0 }
    t.references :author
    t.string :format
    t.column :name, :string
    t.column :status, :integer, **default_zero
    t.column :read_status, :integer, **default_zero
    t.column :nullable_status, :integer
    t.column :language, :integer, **default_zero
    t.column :author_visibility, :integer, **default_zero
    t.column :illustrator_visibility, :integer, **default_zero
    t.column :font_size, :integer, **default_zero
    t.column :difficulty, :integer, **default_zero
    t.column :cover, :string, default: "hard"
    t.string :isbn, **case_sensitive_options
    t.datetime :published_on
    t.index [:author_id, :name], unique: true
    t.index :isbn, where: "published_on IS NOT NULL", unique: true
  end


sql = +""
sql << "SET IDENTITY_INSERT #{insert.model.quoted_table_name} ON;" if includes_primary_key
sql << "MERGE INTO #{insert.model.quoted_table_name} WITH (UPDLOCK, HOLDLOCK) AS target"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe HOLDLOCK is equivalent to SERIALIZABLE. I'd prefer the SQL standard term. Unless HOLDLOCK is usual in the SQL Server community?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check that. Thanks!

@@ -140,6 +140,37 @@ def default_insert_value(column)
private :default_insert_value

def build_insert_sql(insert) # :nodoc:
if insert.skip_duplicates?
# Do we have a unique_by index? Use index columns
conflict_columns = if (unique_by = insert.send(:insert_all).unique_by)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not do insert.insert_all?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we cache insert_all in a local variable rather than have to look it up each time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert_all is private, it's not possible to do insert.insert_all.
Thanks for the suggestion but I'm still not paying attention to things like this. As I said, the PR is in an early stage. I have plans to improve the code (refactor) but right now I'm focused on produce the SQL that cover all tests.

sql = +""
sql << "SET IDENTITY_INSERT #{insert.model.quoted_table_name} ON;" if includes_primary_key
sql << "MERGE INTO #{insert.model.quoted_table_name} WITH (UPDLOCK, HOLDLOCK) AS target"
sql << " USING (SELECT DISTINCT * FROM (#{insert.values_list}) AS t1 (#{insert.send(:columns_list)})) AS source"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the standard adapters uniquify insert rows. I don't think we need to do it here either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uniquify was the solution to this test/scenario

Book.insert_all [
  { author_id: 8, name: "Refactoring" },
  { author_id: 8, name: "Refactoring" }
]

I don't like it but it works so I move on until all test pass.

Then I found the failing test (still stuck with it)

Book.insert_all [
  { id: 200, author_id: 8, name: "Refactoring" },
  { id: 201, author_id: 8, name: "Refactoring" }
]

My impression is that the solution to this will let me remove the uniquify.

end
sql << ";"
sql << "SET IDENTITY_INSERT #{insert.model.quoted_table_name} OFF;" if includes_primary_key
return sql

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeatedly appending to a string is somewhat slow. Faster in that case is to append strings to an array and then join the array at the end. Not sure how much difference that would make, but it would be just as easy to read, I think, and would probably be noticeable if we are dealing with many rows.

@justinko
Copy link

I think MERGE is still appropriate for this ... it just doesn't support duplicates in the "source". More info on that here: https://www.ibm.com/docs/en/informix-servers/14.10?topic=statement-handling-duplicate-rows

Would a gem (e.g. active record-sqlserver-adapter-insert-all) make sense as a replacement for this pull? There would of course be a caveat stated that your source/args for insertion cannot contain duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ActiveRecord::ConnectionAdapters::SQLServerAdapter does not support skipping duplicates
4 participants