Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update API to deploy separate endpoints by run ID #2

Merged
merged 13 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 104 additions & 64 deletions api.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Setup ------------------------------------------------------------------------
library(arrow)
library(assertthat)
library(aws.s3)
library(ccao)
library(dplyr)
Expand All @@ -11,91 +12,130 @@ source("generics.R")
# Read AWS creds from Docker secrets
if (file.exists("/run/secrets/ENV_FILE")) {
readRenviron("/run/secrets/ENV_FILE")
} else {
} else if (file.exists("secrets/ENV_FILE")) {
readRenviron("secrets/ENV_FILE")
} else {
readRenviron(".env")
}
readRenviron(".env")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here was a bit confusing for local development, where neither /run/secrets/ENV_FILE nor secrets/ENV_FILE exist. I think the new conditional structure should make local development easier, but let me know if I'm misinterpreting something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is pretty confusing and underdocumented. If I recall, the file secrets/ENV_FILE is mounted to /run/secrets/ENV_FILE when using compose. This file contains AWS creds specific to the API. During development, these creds would be unnecessary as the user would likely have active AWS creds via aws-mfa.

The .env file is separate and not related. It contains the rest of the setup vars used by compose (API_PORT, CCAO_REGISTRY_URL, etc.) and the final model ID and year. In other words, it's just config stuff, not actually secret. This file is necessary to load during development, but NOT when deployed via compose (compose adds all vars in a .env file to the deployed container). So, the logic here makes sense if you remove the change on line 15.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I refactored in abe5403 to always load this file and only load secrets/ENV_FILE if it exists (similar to /run/secrets/ENV_FILE).


# Get the model run attributes at runtime from env vars
dvc_bucket <- Sys.getenv("AWS_S3_DVC_BUCKET")
run_bucket <- Sys.getenv("AWS_S3_MODEL_BUCKET")
run_id <- Sys.getenv("AWS_S3_MODEL_RUN_ID")
run_year <- Sys.getenv("AWS_S3_MODEL_YEAR")
dfsnow marked this conversation as resolved.
Show resolved Hide resolved
api_port <- as.numeric(Sys.getenv("API_PORT", unset = "3636"))
default_run_id_var_name <- "AWS_S3_MODEL_RUN_ID"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to change the name of this env var to something that more clearly marks it as the default (like AWS_S3_DEFAULT_MODEL_RUN_ID), but keeping the same name for now means one fewer thing we have to change during deployment. I'm open to changing it now if you feel strongly about it, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change it now and make a list of things we actually need to change when redeploying.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in abe5403, with an updated list of deploy steps in the PR body.

default_run_id <- Sys.getenv(default_run_id_var_name)


# Download Files ---------------------------------------------------------------

# Grab model fit and recipe objects
temp_file_fit <- tempfile(fileext = ".zip")
aws.s3::save_object(
object = file.path(
run_bucket, "workflow/fit",
paste0("year=", run_year),
paste0(run_id, ".zip")
),
file = temp_file_fit
# The list of run IDs that will be deployed as possible model endpoints
valid_run_ids <- c(
"2024-02-06-relaxed-tristan",
"2024-03-17-stupefied-maya"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other run IDs that we should include in this vector for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's actually include the final 2022 and 2023 models. We can just reproduce and replace the old workbooks for those years.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, done in abe5403. Are the old workbooks even still operational, given that the model version in the (currently static) API has changed since 2022 and 2023?

)

temp_file_recipe <- tempfile(fileext = ".rds")
aws.s3::save_object(
object = file.path(
run_bucket, "workflow/recipe",
paste0("year=", run_year),
paste0(run_id, ".rds")
),
file = temp_file_recipe
assert_that(
default_run_id %in% valid_run_ids,
msg = sprintf(
"%s must be a valid run_id - got '%s', expected one of: %s",
default_run_id_var_name,
default_run_id,
paste(valid_run_ids, collapse = ", ")
)
)

# Grab metadata file for the specified run
metadata <- read_parquet(
file.path(
run_bucket, "metadata",
paste0("year=", run_year),
paste0(run_id, ".parquet")
# Given a run ID, return a model object that can be used to power a
# vetiver API endpoint
get_model_from_run_id <- function(run_id) {
run_year = substr(run_id, 1, 4)

# Download Files -------------------------------------------------------------

# Grab model fit and recipe objects
temp_file_fit <- tempfile(fileext = ".zip")
aws.s3::save_object(
object = file.path(
run_bucket, "workflow/fit",
paste0("year=", run_year),
paste0(run_id, ".zip")
),
file = temp_file_fit
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
)
)

# Load the training data used for this model
training_data_md5 <- metadata$dvc_md5_training_data
training_data <- read_parquet(
file.path(
dvc_bucket,
substr(training_data_md5, 1, 2),
substr(training_data_md5, 3, nchar(training_data_md5))
temp_file_recipe <- tempfile(fileext = ".rds")
aws.s3::save_object(
object = file.path(
run_bucket, "workflow/recipe",
paste0("year=", run_year),
paste0(run_id, ".rds")
),
file = temp_file_recipe
)
)

# Grab metadata file for the specified run
metadata <- read_parquet(
file.path(
run_bucket, "metadata",
paste0("year=", run_year),
paste0(run_id, ".parquet")
)
)

# Load Model -------------------------------------------------------------------
# Load the training data used for this model
training_data_md5 <- metadata$dvc_md5_training_data
training_data <- read_parquet(
file.path(
dvc_bucket,
substr(training_data_md5, 1, 2),
substr(training_data_md5, 3, nchar(training_data_md5))
)
)

# Load fit and recipe from file
fit <- lightsnip::lgbm_load(temp_file_fit)
recipe <- readRDS(temp_file_recipe)

# Extract a sample row of predictors to use for the API docs
predictors <- recipe$var_info %>%
filter(role == "predictor") %>%
pull(variable)
ptype_tbl <- training_data %>%
filter(meta_pin == "15251030220000") %>%
select(all_of(predictors))
ptype <- vetiver_create_ptype(model = fit, save_prototype = ptype_tbl)
# Load Model -----------------------------------------------------------------

# Load fit and recipe from file
fit <- lightsnip::lgbm_load(temp_file_fit)
recipe <- readRDS(temp_file_recipe)

# Create API -------------------------------------------------------------------
# Extract a sample row of predictors to use for the API docs
predictors <- recipe$var_info %>%
filter(role == "predictor") %>%
pull(variable)
ptype_tbl <- training_data %>%
filter(meta_pin == "15251030220000") %>%
select(all_of(predictors))
ptype <- vetiver_create_ptype(model = fit, save_prototype = ptype_tbl)

# Create model object and populate metadata
model <- vetiver_model(fit, "LightGBM", save_prototype = ptype)
model$recipe <- recipe
model$pv$round_type <- metadata$pv_round_type
model$pv$round_break <- metadata$pv_round_break[[1]]
model$pv$round_to_nearest <- metadata$pv_round_to_nearest[[1]]

# Start API
pr() %>%
vetiver_api(model) %>%
pr_run(
host = "0.0.0.0",
port = api_port
# Create API -----------------------------------------------------------------

# Create model object and populate metadata
model <- vetiver_model(fit, "LightGBM", save_prototype = ptype)
model$recipe <- recipe
model$pv$round_type <- metadata$pv_round_type
model$pv$round_break <- metadata$pv_round_break[[1]]
model$pv$round_to_nearest <- metadata$pv_round_to_nearest[[1]]

return(model)
}

default_model <- get_model_from_run_id(default_run_id)

router <- pr() %>%
# Point the /predict endpoint to the default model
vetiver_api(default_model)

# Create endpoints for each model based on run ID and add them to the router
for (run_id in valid_run_ids) {
model <- get_model_from_run_id(run_id)
vetiver_api(
router,
model,
path = sprintf("/predict/%s", run_id)
)
}

# Start API
pr_run(
router,
host = "0.0.0.0",
port = api_port
)
1 change: 0 additions & 1 deletion docker-compose.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember why we're running privileged: true here, but we should try to remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d8bf46f, let's see what happens!

Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ services:
- AWS_S3_DVC_BUCKET
- AWS_S3_MODEL_BUCKET
- AWS_S3_MODEL_RUN_ID
- AWS_S3_MODEL_YEAR
- API_PORT
secrets:
- ENV_FILE
Expand Down
24 changes: 12 additions & 12 deletions renv.lock
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"R": {
"Version": "4.2.2",
"Version": "4.3.2",
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
"Repositories": [
{
"Name": "CRAN",
Expand Down Expand Up @@ -146,19 +146,19 @@
},
"assessr": {
"Package": "assessr",
"Version": "0.5.2",
"Version": "0.6.0",
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
"Source": "GitHub",
"RemoteType": "github",
"RemoteHost": "api.github.com",
"RemoteUsername": "ccao-data",
"RemoteRepo": "assessr",
"RemoteRef": "master",
"RemoteSha": "dcfc0f0585462cc87cab42b965d16ec4c5546256",
"RemoteSha": "3c0172c47da0adf48be9084be141564f06872220",
"RemoteHost": "api.github.com",
"Requirements": [
"R",
"stats"
],
"Hash": "2bb19b867910fb7334778ec519fac8d2"
"Hash": "7229107fa32f9570d4be09258478845a"
},
"aws.s3": {
"Package": "aws.s3",
Expand Down Expand Up @@ -256,14 +256,14 @@
},
"ccao": {
"Package": "ccao",
"Version": "1.2.2",
"Version": "1.3.0",
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
"Source": "GitHub",
"RemoteType": "github",
"RemoteHost": "api.github.com",
"RemoteUsername": "ccao-data",
"RemoteRepo": "ccao",
"RemoteRef": "master",
"RemoteSha": "74737102c48ce07b769a10f42693d9c2c958e9ef",
"RemoteRef": "969dae702ed420ba9f9d252e5a0459c63b991e80",
"RemoteSha": "969dae702ed420ba9f9d252e5a0459c63b991e80",
"Remotes": "ccao-data/assessr",
"Requirements": [
"R",
Expand All @@ -273,7 +273,7 @@
"rlang",
"tidyr"
],
"Hash": "2aaaeceb70766ef2dc175999eb4aa0ad"
"Hash": "82d3e915a18fc2b38c8fb10988322fe6"
},
"class": {
"Package": "class",
Expand Down Expand Up @@ -1335,9 +1335,9 @@
},
"tibble": {
"Package": "tibble",
"Version": "3.1.8",
"Version": "3.2.1",
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
"Source": "Repository",
"Repository": "CRAN",
"Repository": "RSPM",
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
"Requirements": [
"R",
"fansi",
Expand All @@ -1350,7 +1350,7 @@
"utils",
"vctrs"
],
"Hash": "56b6934ef0f8c68225949a8672fe1a8f"
"Hash": "a84e2cc86d07289b3b6f5069df7a004c"
},
"tidyr": {
"Package": "tidyr",
Expand Down
Loading