Some updates to the cloud formation template #48

tomfaulhaber · 2018-11-12T22:29:47Z

Parameterize the AMI so the template can be used in different regions (Regions and AMIs should be parameterized #44)
Don't hardcode the region, use the one where the template is running (Regions and AMIs should be parameterized #44)
Change the Airflow setup process so that it works on new AMIs (Airflow installation doesn’t work in the latest AMIs #46)
Add the /efs filesystem to /etc/fstab so the system can be rebooted (The /efs mount doesn’t survive reboots #45)
Change the name "SpotInstanceType" to "WorkerInstanceType" since we're not
using spot yet and we don't want to confuse users.

1. Parameterize the AMI so the template can be used in different regions (villasv#44) 2. Don't hardcode the region, use the one where the template is running (villasv#44) 3. Change the Airflow setup process so that it works on new AMIs (villasv#46) 4. Add the /efs filesystem to /etc/fstab so the system can be rebooted (villasv#45) 5. Change the name "SpotInstanceType" to "WorkerInstanceType" since we're not using spot yet and we don't want to confuse users.

tomfaulhaber · 2018-11-12T22:34:48Z

Also, please note that this updates the Airflow version to 1.10.1b1. We need this beta for what we're doing because otherwise the combo of SQS and instance roles breaks Airflow.

The release should be finalized within the next few days and I'll update the PR to point at Airflow 1.10.1 at that point.

villasv · 2018-11-13T12:31:43Z

Looks great! Good stuff. I'll do some testing now and get back soon.

villasv · 2018-11-13T17:06:53Z

Hey @tomfaulhaber, here's the results of my experimenting:

✔️ I was able to issue airflow backfill -s 2018-11-09 tutorial and the tasks were correctly executed by a Celery worker after send/receive from SQS.
❌ I was not able to make DAGs run by manually running them in the UI, task instances are entering a null state and are not being seen by the scheduler

Here's my config:

[core]
executor = CeleryExecutor
parallelism = 100
dag_concurrency = 100
max_active_runs_per_dag = 10

fernet_key = SOMEVERYLARGEVALUEHEREFORSAFETYPUPORSESpKFf87WFwLbfzqDDho=

load_examples = False

[cli]
api_client = airflow.api.client.local_client

[operators]
default_owner = Airflow
default_cpus = 1
default_ram = 512
default_disk = 512
default_gpus = 0

[webserver]
base_url = http://localhost:8080
web_server_host = 0.0.0.0
web_server_port = 8080

workers = 4
worker_class = sync
web_server_worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 30

secret_key = temporary_key
expose_config = True
authenticate = False
filter_by_owner = False

log_fetch_timeout_sec = 5

[celery]
celery_app_name = airflow.executors.celery_executor
celeryd_concurrency = 1
worker_log_server_port = 8793

[scheduler]
# Task instances listen for external kill signal (when you clear tasks
# from the CLI or the UI), this defines the frequency at which they should
# listen (in seconds).
job_heartbeat_sec = 5

# The scheduler constantly tries to trigger new tasks (look at the
# scheduler section in the docs for more information). This defines
# how often the scheduler should run (in seconds).
scheduler_heartbeat_sec = 5

# after how much time should the scheduler terminate in seconds
# -1 indicates to run continuously (see also num_runs)
run_duration = -1

# after how much time a new DAGs should be picked up from the filesystem
min_file_process_interval = 0

dag_dir_list_interval = 300

# How often should stats be printed to the logs
print_stats_interval = 30

# child_process_log_directory => let the default: $AIRFLOW_HOME/logs

# Local task jobs periodically heartbeat to the DB. If the job has
# not heartbeat in this many seconds, the scheduler will mark the
# associated task instance as failed and will re-schedule the task.
scheduler_zombie_task_threshold = 300

# Turn off scheduler catchup by setting this to False.
# Default behavior is unchanged and
# Command Line Backfills still work, but the scheduler
# will not do scheduler catchup if this is False,
# however it can be set on a per DAG basis in the
# DAG definition (catchup)
catchup_by_default = True

# Statsd (https://github.com/etsy/statsd) integration settings
statsd_on = False
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow

# The scheduler can run multiple threads in parallel to schedule dags.
# This defines how many threads will run. However airflow will never
# use more threads than the amount of cpu cores available.
max_threads = 2

authenticate = False

A few of those have been deprecated already, I'll keep digging a bit to find out more.
In the meantime, I'll merge the PR because it accomplishes what it's meant for.

tomfaulhaber · 2018-11-13T17:16:26Z

Hmm, interesting. I was having the same problem starting tasks from the UI. I had to run a backfill from the cli to get the job to start. I figured I was doing something wrong, but I guess something is misconfigured.

Let's both dig in and see if we can sort it.

This was referenced Nov 13, 2018

Airflow installation doesn’t work in the latest AMIs #46

Closed

The /efs mount doesn’t survive reboots #45

Closed

villasv merged commit cf03bce into villasv:master Nov 13, 2018

villasv mentioned this pull request Nov 13, 2018

Upgrade to Airflow 1.10.0 #39

Closed

villasv added this to the Amazonic Art milestone Nov 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some updates to the cloud formation template #48

Some updates to the cloud formation template #48

tomfaulhaber commented Nov 12, 2018

tomfaulhaber commented Nov 12, 2018

villasv commented Nov 13, 2018

villasv commented Nov 13, 2018

tomfaulhaber commented Nov 13, 2018

Some updates to the cloud formation template #48

Some updates to the cloud formation template #48

Conversation

tomfaulhaber commented Nov 12, 2018

tomfaulhaber commented Nov 12, 2018

villasv commented Nov 13, 2018

villasv commented Nov 13, 2018

tomfaulhaber commented Nov 13, 2018