Skip to content

Commit

Permalink
adding empty content
Browse files Browse the repository at this point in the history
  • Loading branch information
Zac Davies committed Jun 20, 2024
1 parent 511f5d1 commit 7e8d543
Show file tree
Hide file tree
Showing 38 changed files with 9,668 additions and 2 deletions.
Empty file added _book/.nojekyll
Empty file.
519 changes: 519 additions & 0 deletions _book/chapters/data-eng/odbc-dbplyr.html

Large diffs are not rendered by default.

492 changes: 492 additions & 0 deletions _book/chapters/misc/htmlwidgets.html

Large diffs are not rendered by default.

515 changes: 515 additions & 0 deletions _book/chapters/misc/odbc-oauth.html

Large diffs are not rendered by default.

845 changes: 845 additions & 0 deletions _book/chapters/mlflow/log-to-uc.html

Large diffs are not rendered by default.

496 changes: 496 additions & 0 deletions _book/chapters/mlflow/model-serving.html

Large diffs are not rendered by default.

496 changes: 496 additions & 0 deletions _book/chapters/pkg-management/fast-installs.html

Large diffs are not rendered by default.

496 changes: 496 additions & 0 deletions _book/chapters/pkg-management/persisting-libs.html

Large diffs are not rendered by default.

496 changes: 496 additions & 0 deletions _book/chapters/pkg-management/renv.html

Large diffs are not rendered by default.

539 changes: 539 additions & 0 deletions _book/index.html

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _book/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Sitemap: https://zacdav-db.github.io/dbrx-r-compendium/sitemap.xml
79 changes: 79 additions & 0 deletions _book/search.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
[
{
"objectID": "index.html",
"href": "index.html",
"title": "R on Databricks Compendium",
"section": "",
"text": "What is this?\n\n\n\n\n\n\nUnder Development\n\n\n\n\n\n\n\n\n\n\n\n\nThis is not intended to be an exhaustive guide, it’s currently a place for me to document and collate useful information regarding R on Databricks.\n\n\n\n\n\n\nThere aren’t many definitive examples of how to use R and Databricks together - hopefully the content here will serve as a useful resource."
},
{
"objectID": "chapters/mlflow/log-to-uc.html#unity-catalog-model-requirements",
"href": "chapters/mlflow/log-to-uc.html#unity-catalog-model-requirements",
"title": "1  Log R Models to Unity Catalog",
"section": "1.1 Unity Catalog Model Requirements",
"text": "1.1 Unity Catalog Model Requirements\nFor models to be logged into Unity Catalog they must have a model signature. The Model signature defines the schema for model inputs/outputs.\nTypically when using python this would be inferred via model input examples. Input examples are optional but strongly recommended.\nThe documentation discusses signature enforcement, currently this isn’t implemented for R. Therefore you can decide if the signature is a dummy value for the sake of moving forward, or correct to clearly communicate the behaviour of the model.\n\n\n\n\n\n\nImportant\n\n\n\nIt’s important to clarify that for python the signature is enforced at time of inference not when registering the model to Unity Catalog.\nThe signature correctness is not validated when registering the model, it just has to be syntactically valid.\n\n\nSo, let’s look at the existing code to log models in the crate flavour:\n\nmlflow_save_model.crate <- function(model, path, model_spec=list(), ...) {\n1 if (dir.exists(path)) unlink(path, recursive = TRUE)\n dir.create(path)\n\n2 serialized <- serialize(model, NULL)\n\n3 saveRDS(\n serialized,\n file.path(path, \"crate.bin\")\n )\n\n4 model_spec$flavors <- append(model_spec$flavors, list(\n crate = list(\n version = \"0.1.0\",\n model = \"crate.bin\"\n )\n ))\n mlflow_write_model_spec(path, model_spec)\n model_spec\n}\n\n\n1\n\nCreate the directory to save the model if it doesn’t exist, if it does, empty it\n\n2\n\nSerialise the model, which is an object of class crate (from {carrier} package)\n\n3\n\nSave the serialised model via saveRDS to the directory as crate.bin\n\n4\n\nDefine the model specification, this contains metadata required ensure reproducibility. In this case it’s only specifying a version and what file the model can be found within.\n\n\n\n\nThe missing puzzle piece is the definition of a signature. Instead of explicitly adding code to the crate flavour itself, we’ll take advantage of the model_spec parameter.\nThat means we can focus on mlflow::mlflow_log_model directly, we’d need to adjust the code as follows:\n\n1mlflow_log_model <- function(model, artifact_path, ...) {\n \n temp_path <- fs::path_temp(artifact_path)\n \n model_spec <- mlflow_save_model(\n model, path = temp_path,\n2 model_spec = list(\n utc_time_created = mlflow_timestamp(),\n run_id = mlflow_get_active_run_id_or_start_run(),\n artifact_path = artifact_path,\n flavors = list()\n ),\n ...)\n \n res <- mlflow_log_artifact(path = temp_path, artifact_path = artifact_path)\n \n tryCatch({\n mlflow:::mlflow_record_logged_model(model_spec)\n },\n error = function(e) {\n warning(\n paste(\"Logging model metadata to the tracking server has failed, possibly due to older\",\n \"server version. The model artifacts have been logged successfully.\",\n \"In addition to exporting model artifacts, MLflow clients 1.7.0 and above\",\n \"attempt to record model metadata to the tracking store. If logging to a\",\n \"mlflow server via REST, consider upgrading the server version to MLflow\",\n \"1.7.0 or above.\", sep=\" \")\n )\n })\n res\n}\n\n\n1\n\nAdd a new parameter signature\n\n2\n\nPropagate signature to the model_spec parameter when invoking mlflow::mlflow_save_model\n\n\n\n\nBenefit of this method is that all model flavors will inherit the capability to log a signature."
},
{
"objectID": "chapters/mlflow/log-to-uc.html#working-through-the-solution",
"href": "chapters/mlflow/log-to-uc.html#working-through-the-solution",
"title": "1  Log R Models to Unity Catalog",
"section": "1.2 Working Through the Solution",
"text": "1.2 Working Through the Solution\nTo keep things simple we’ll be logging a “model” (a function which divides by two).\n\nhalf <- function(x) x / 2\n\nhalf(1:10)\n\n [1] 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0\n\n\nWithout any changes, a simplified example of logging to {mlflow} would look like:\n\nlibrary(carrier)\nlibrary(mlflow)\n\nwith(mlflow_start_run(), {\n # typically you'd do more modelling related activities here\n model <- carrier::crate(~half(.x))\n1 mlflow_log_model(model, \"model\")\n})\n\n\n1\n\nAs discussed earlier, this is where things start to go awry with respect to Unity Catalog\n\n\n\n\n\n1.2.1 Patching mlflow_log_model\n\n\n\n\n\n\nNote\n\n\n\nTechnically, patching mlflow_log_model isn’t the only way to achieve this fix - you could modify the yaml after it’s written.\nI won’t be showing that method as It’s just as tedious and can change depending on the model flavour (with respect to where artifacts may reside), patching is more robust.\n\n\n\n1mlflow_log_model <- function(model, artifact_path, signature = NULL, ...) {\n \n2 format_signature <- function(signature) {\n lapply(signature, function(x) {\n jsonlite::toJSON(x, auto_unbox = TRUE)\n })\n }\n \n temp_path <- fs::path_temp(artifact_path)\n \n model_spec <- mlflow_save_model(model, path = temp_path, model_spec = list(\n utc_time_created = mlflow:::mlflow_timestamp(),\n run_id = mlflow:::mlflow_get_active_run_id_or_start_run(),\n artifact_path = artifact_path, \n flavors = list(),\n3 signature = format_signature(signature)\n ), ...)\n \n res <- mlflow_log_artifact(path = temp_path, artifact_path = artifact_path)\n \n tryCatch({\n mlflow:::mlflow_record_logged_model(model_spec)\n },\n error = function(e) {\n warning(\n paste(\"Logging model metadata to the tracking server has failed, possibly due to older\",\n \"server version. The model artifacts have been logged successfully.\",\n \"In addition to exporting model artifacts, MLflow clients 1.7.0 and above\",\n \"attempt to record model metadata to the tracking store. If logging to a\",\n \"mlflow server via REST, consider upgrading the server version to MLflow\",\n \"1.7.0 or above.\", sep=\" \")\n )\n })\n res\n}\n\n# overriding the function in the existing mlflow namespace \nassignInNamespace(\"mlflow_log_model\", mlflow_log_model, ns = \"mlflow\")\n\n\n1\n\nsignature has been added to function parameters, it’s defaulting to NULL so that existing code won’t break\n\n2\n\nAdding format_signature function so don’t need to write JSON by hand, adding this within function for simplicity\n\n3\n\nsignature is propagated to mlflow_save_model’s model_spec parameter which will write a valid signature\n\n\n\n\n\n\n1.2.2 Logging Model with a Signature\n\nwith(mlflow_start_run(), {\n # typically you'd do more modelling related activities here\n model <- carrier::crate(~half(.x))\n1 signature <- list(\n inputs = list(list(type = \"double\", name = \"x\")),\n outputs = list(list(type = \"double\"))\n )\n2 mlflow_log_model(model, \"model\", signature = signature)\n})\n\n\n1\n\nExplicitly defining a signature, a list that contains input and outputs, each are lists of lists respectively\n\n2\n\nPassing defined signature to the now patched mlflow_log_model function\n\n\n\n\n\n\n1.2.3 Registration to Unity Catalog\nNow that the prerequisite of adding a model signature has been satisfied there is one last hurdle to overcome, registering to Unity Catalog.\nThe hurdle is due to {mlflow} not having been updated yet to support registration to Unity Catalog directly. The easiest way to overcome this is to simply register the run via python.\nFor example:\n\nimport mlflow\nmlflow.set_registry_uri(\"databricks-uc\")\n\ncatalog = \"main\"\nschema = \"default\"\nmodel_name = \"my_model\"\n1run_uri = \"runs:/<run_id>/model\"\n\nmlflow.register_model(run_uri, f\"{catalog}.{schema}.{model_name}\")\n\n\n1\n\nYou’ll need to either get the run_uri programmatically or copy it manually\n\n\n\n\nTo do this with R you’ll need to make a series of requests to Unity Catalog endpoints for registering model, the specific steps are:\n\n(Optional) Create a new model in Unity Catalog\n\nPOST request on /api/2.0/mlflow/unity-catalog/registered-models/create\n\nname: 3 tiered namespace (e.g. main.default.my_model)\n\n\nCreate a version for the model\n\nPOST request on /api/2.0/mlflow/unity-catalog/model-versions/create\n\nname: 3 tiered namespace (e.g. main.default.my_model)\nsource: URI indicating the location of the model artifacts\nrun_id: run_id from tracking server that generated the model\nrun_tracking_server_id: Workspace ID of run that generated the model\n\nThis will return storage_location and version\n\nCopy model artifacts\n\nNeed to copy the artifacts to storage_location from step (2)\n\nFinalise the model version\n\nPOST request on /api/2.0/mlflow/unity-catalog/model-versions/finalize\n\nname: 3 tiered namespace (e.g. main.default.my_model)\nversion: version returned from step (2)\n\n\n\nIt’s considerably easier to just use Python to register the model at this time."
},
{
"objectID": "chapters/mlflow/log-to-uc.html#fixing-mlflow",
"href": "chapters/mlflow/log-to-uc.html#fixing-mlflow",
"title": "1  Log R Models to Unity Catalog",
"section": "1.3 Fixing mlflow",
"text": "1.3 Fixing mlflow\nIdeally this page wouldn’t exist and {mlflow} would support Unity Catalog. Hopefully sometime soon I find the time to make a pull request myself - until then this serves as a guide."
},
{
"objectID": "chapters/pkg-management/fast-installs.html",
"href": "chapters/pkg-management/fast-installs.html",
"title": "3  Faster Package Installs",
"section": "",
"text": "Under Development"
},
{
"objectID": "chapters/pkg-management/persisting-libs.html",
"href": "chapters/pkg-management/persisting-libs.html",
"title": "4  Persisting Packages",
"section": "",
"text": "Under Development"
},
{
"objectID": "chapters/pkg-management/renv.html",
"href": "chapters/pkg-management/renv.html",
"title": "5  {renv}",
"section": "",
"text": "Under Development"
},
{
"objectID": "chapters/misc/htmlwidgets.html",
"href": "chapters/misc/htmlwidgets.html",
"title": "6  {htmlwidgets} in notebooks",
"section": "",
"text": "Under Development"
},
{
"objectID": "chapters/mlflow/model-serving.html",
"href": "chapters/mlflow/model-serving.html",
"title": "2  Model Serving",
"section": "",
"text": "Under Development"
},
{
"objectID": "chapters/misc/odbc-oauth.html",
"href": "chapters/misc/odbc-oauth.html",
"title": "8  OAuth 🤝 {odbc}",
"section": "",
"text": "Under Development"
},
{
"objectID": "chapters/data-eng/odbc-dbplyr.html",
"href": "chapters/data-eng/odbc-dbplyr.html",
"title": "6  {dbplyr} & {odbc}",
"section": "",
"text": "Under Development"
}
]
Loading

0 comments on commit 7e8d543

Please sign in to comment.