-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SQL Server Incremental Statement #447
Comments
Interesting. You should be able to use the custom SQL for this, using the # create query.sql
echo '
with ids as (
select top 400000 "BillingTransactionKey"
from "dbo"."BillingTransactionFact"
where "BillingTransactionKey" > coalesce({incremental_value}, 0)
order by "BillingTransactionKey" asc
) select *
from "dbo"."BillingTransactionFact" btf
inner join ids i
on btf."BillingTransactionKey" = i."BillingTransactionKey";
' > query.sql
# use query.sql as stream
sling run --src-conn CDW_MSSQL --src-stream file://query.sql --tgt-conn ORABOODLE --tgt-object BillingTransactionFact --tgt-conn ORABOODLE --tgt-object BillingTransactionFact --mode incremental --update-key BillingTransactionKey --limit 250000 -d |
One little tweak to make it work:
This made a huge difference. I went from processing 250K rows per iteration with throughput of about 1K rows/sec to doing 2M at 3600 rows/sec. I really think this should be the default query. It is worth the wordiness. Cheers, |
In the interests of full disclosure, we did also more than double the TEMPDB storage on SQL Server. This may have something to do with my observed performance improvement. But I think the change here essentially removes TEMPDB as a bottleneck. Wherever possible for SQL Server at least, I think we want to limit projections when ordering results, and subqueries can really help performance. |
Thanks for the details. It's interesting that it improved that much, I would not have guessed. Each database has its own engine optimization and particularities... I will look into setting that logic as the default for SQL Server. |
Also, we are running SQL Server version 15.0.4410. YMMV. |
Okay, my subquery hack is not including all records. I really thought using "top" in a subquery would honor the "order by," but after all iterations, I find a bunch of records are missed. So this is what my new
where 10000000 is what is passed to sling via This is effectively processing at most the specified limit of rows and doing a simple insert to save time. And it avoids the "order by" entirely, which relieves the pressure on TEMPDB. |
Issue Description
I am incrementally slinging data to a large Oracle table via sqlldr from SQL Server. Here is an example command:
The table rows have a lot of data, which requires a lot of space to manage when ordering is required. For this table, I must move in very small increments to avoid the following error:
The command spawns a query that is logged like this:
This query, after considerable processing time, produces the TEMPDB error above.
The query produces the same error outside sling.
The following equivalent query, on the other hand, is both more performant and uses much less TEMPDB space:
Could sling use logic like this to read from a SQL Server source database?
I think this optimization could help a lot.
Sling version (
sling --version
):Version: 1.2.22
Operating System (
linux
,mac
,windows
):Windows
Replication Configuration:
N/A
Log Output (please run command with
-d
):See above.
The text was updated successfully, but these errors were encountered: