Skip to content
This repository has been archived by the owner on Oct 11, 2021. It is now read-only.

migration from v3 to v4 #150

Open
asafcombo opened this issue Feb 20, 2020 · 7 comments
Open

migration from v3 to v4 #150

asafcombo opened this issue Feb 20, 2020 · 7 comments
Labels

Comments

@asafcombo
Copy link

Hi, I need to migrate between the mentioned versions, as I am opening a new template with v4 ( the support for existing VPC is great btw ! ).

What is the best course of action so I will keep all the logs / metadata and previous submitted jobs ?

the below is a list of resources that should be migrated:

Database
I thought of 2 options here:

  • do some kind of dump from the old db and load into the new db.
  • snapshot the old db, delete the new db, and restore snapshot with parameters that replicate all the parameters of the new db that was deleted.

logs (S3)
Here I assume a simple scan and cp between the old and new buckets will suffice.

EFS
Unless I'm mistaken, unless I specifically use it in some of my code, I do not have anything in there.

@villasv
Copy link
Owner

villasv commented Feb 20, 2020

You're absolutely right about those three :-) Both options for the database should work fine too.

You might also want to back up your connections separately because v4 uses encryption to store connection credentials, I'm not sure what's going to happen if it tries to decrypt something that is already plaintext. If you don't have too many of them you can just add them again.

If you do have too many of them and simply restoring them to the database doesn't work, you'll have to import them indirectly. Some guidance here: https://stackoverflow.com/questions/55626195/export-all-airflow-connections-to-new-environment

@villasv villasv added the docs label Feb 20, 2020
@asafcombo
Copy link
Author

Well, after trying , the db migration seems more complicated than initially thought.

It seems that the schema is not the same between the versions.

for example:

  • v3 has 24 tables owned by airflow compared to 23
  • The dag table has at least one new field root_dag_id.

So I wouldn't add the above to the documentation yet.

@villasv
Copy link
Owner

villasv commented Feb 24, 2020

That’s not particular to this stack. Certainly that’s due to different Airflow versions being used by v3 and v4. The airflow db upgrade command should take care of handling the schema.

So my suggestion is that you load your snapshot into a database, perform a migration using the latest airflow, and snapshot that and load that into your v4 deployment.

@asafcombo
Copy link
Author

One final detail, might help the next person:

After DB migration, you might encounter issues with the scheduler . This might happen as the task_instance table persists the queue - and in case of migration this table will contain also the old queue name.

I encountered an issue where the scheduler failed due to that.

To fix, I connected to the db and ran:

update task_instance set queue = 'NEW_QUEUE' where queue != 'OLD_QUEUE' and execution_date >= 'SOME_DATE_BEFORE_MIGRATION';

@villasv
Copy link
Owner

villasv commented Feb 26, 2020

Oh yes, a fine detail. Did that happen because there were tasks scheduled to run during the db backup? Or is that a problem for old completed tasks as well?

@villasv
Copy link
Owner

villasv commented Mar 12, 2020

Hey @asafcombo, I wanted to hear more about the upgrade experience. Did everything work out in the end? Any more details that came up?

@asafcombo
Copy link
Author

asafcombo commented Mar 17, 2020

HI @villasv

Oh yes, a fine detail. Did that happen because there were tasks scheduled to run during the db backup? Or is that a problem for old completed tasks as well?

I am not sure actually. I do remember turning off all tasks before the migration though.

Hey @asafcombo, I wanted to hear more about the upgrade experience. Did everything work out in the end? Any more details that came up?

Everything worked ok. The only thing I encountered is that with v3, print() statements are caught by the airflow logger, and are written to the task logs.
In v4 this is not the case for all operators. Once I'll figure it out I'll update here.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants