Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is load_method applicable to this target? #266

Open
sebastianswms opened this issue Dec 28, 2023 · 9 comments
Open

Is load_method applicable to this target? #266

sebastianswms opened this issue Dec 28, 2023 · 9 comments

Comments

@sebastianswms
Copy link
Contributor

sebastianswms commented Dec 28, 2023

When running --about, a load_method config option is automatically included from SDK v32. This came up in #240 but we decided not to add it the README there. Many functions in target-postgres override the SDK versions, so not sure if setting load_method will do anything.

As a separate question, should it? Or is other target-postgres functionality already adequate for handling load_method functionality? Worth diving into further.

@pnadolny13
Copy link

pnadolny13 commented Jan 5, 2024

@sebastianswms @visch I implemented these load_methods in meltano/sdk#1893. I dont think any of them do anything by default yet other than overwrite. I'd have to dig into the SDK more to know if theres a way to control append-only vs upsert in the base implementation and what implications that would have.

I would like to see this target implement the ability to run in all 3 modes! It seems to support only upsert today. I wonder what is different in prepare_table and if its still necessary to override that method, if not then we'd get overwrite out of the box.

cc @edgarrmondragon

@visch
Copy link
Member

visch commented Jan 5, 2024

@sebastianswms @visch I implemented these load_methods in meltano/sdk#1893. I dont think any of them do anything by default yet other than overwrite. I'd have to dig into the SDK more to know if theres a way to control append-only vs upsert in the base implementation and what implications that would have.

I would like to see this target implement the ability to run in all 3 modes! It seems to support only upsert today. I wonder what is different in prepare_table and if its still necessary to override that method, if not then we'd get overwrite out of the box.

cc @edgarrmondragon

  1. Append only is supported if there's no key properties set (or if you override them). I thought this was the "default" singer pattern but load_method may be easier for folks to understand
  2. "overwrite" doesn't exist, is an interesting load method though as a lot of my implementation I just do this manually

@pnadolny13
Copy link

I thought this was the "default" singer pattern but load_method may be easier for folks to understand

@visch yeah thats how its always been and probably will be for a long time for non-SDK targets but I agree I think its a lot clearer for users to explicitly set how they want data loaded. Sometimes even if key properties are set I'd want to append only and its kind of an advanced move to know how to override them. You make a good point though, maybe we shouldnt have a default load_method but rather None/null so the target can keep the status quo and decide based on key properties if its not explicitly set. So load_method is used more for overriding whatever load method the target would have used.

@raulbonet
Copy link

Hello,

We would like to use the overwrite functionality in our team and we are willing to take over the development of a PR, but I do have the same question as @pnadolny13 . I do not fully understand why the override of prepare_table is necessary, so if somebody could help here would be great.

Apart from this, the method get_table_columns is also overridden with a different signature. This means that whenever the function get_table() is used, it produces an exception.

So far it works because the built-in function get_table() is only used when the overwrite is enabled, but I think it is not ideal.

Why not make the signature the same as the original (full_table_name) and parse table name and schema name inside the function?

@raulbonet
Copy link

raulbonet commented Mar 18, 2024

Hello,

I had a closer look. If I am understanding this correctly, it seems that a lot of the code customization came because of too many requests being done to the Postgres instance. It seems that these queries were reduced from O(n) to O(1) for each table, which is a great improvement.

However, I wonder if this big amount of queries just comes from somewhere else also.

I see that the function prepare_table() is called inside the process_batch() function. But if I am understanding this correctly, this means that the whole process of making the table conform to the schema is triggered at every process_batch() call, when I think that the prepare_table() function is meant to be called only at the initialization of the sink, which in turns happens when there is a schema change (among others) .

I solved it in the PR (draft yet) here just by creating a simple call to the SQLAlchemy meta to retrieve the current table, wrapped in a function called get_table_from_metadata()

What do you think? I am willing to take over the refactoring of the code with some guidance.

@pnadolny13
Copy link

@raulbonet thanks for digging into this more. I'm curious if anything is changed based on what came out of the caching bug conversation meltano/sdk#2325. I still think having load_method supported in this target is a great feature to add.

@visch @edgarrmondragon @sebastianswms if you have any updated thoughts on this! 😄

@visch
Copy link
Member

visch commented Apr 20, 2024

Today, load methods are supported but not through the load_method configuration. We could certainly support them. The only "tricky" one would be Overwrite, which would require a slightly different insert method, but it'd pretty much match what I just described here - #340 (comment).

How load methods are supported today:

  1. Append - No primary keys, we use this load method
  2. Upsert - If there are primary keys, we use this load method
  3. Overwrite - If we have an activate version message, we use this load method

@kinghuang
Copy link
Contributor

I'm very interested in having a well-optimized overwrite option. After optimizing the bulk insert implementation (#370), most of the time in my PostgreSQL loads is now spent on unnecessary upserts.

@gregkoutsimp
Copy link

Hello,
so at the moment the load_method is not available for the target-postgres event though it is part of the SDK, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

6 participants