-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem crating a lightgbm model #11
Comments
Noting that there are some possibly related conversations at rstudio/bundle#55 and linked issues, though this error seems more crate/rlang-related. Here's a reprex: library(tidymodels)
library(bonsai)
library(carrier)
fit <-
boost_tree("classification", engine = "lightgbm") %>%
fit(Class ~ A + B, two_class_dat)
fit
#> parsnip model object
#>
#> LightGBM Model (100 trees)
#> Objective: binary
#> Fitted to dataset with 2 columns
c_model <- crate(
predict,
model = fit,
predict = rlang::set_env(workflows:::predict.workflow),
is_trained_workflow = rlang::set_env(workflows:::is_trained_workflow),
validate_is_workflow = rlang::set_env(workflows:::validate_is_workflow),
check_dots_empty = rlang::set_env(rlang::check_dots_empty),
ellipsis_dots = rlang::set_env(rlang:::ellipsis_dots),
ffi_ellipsis_dots = rlang:::ffi_ellipsis_dots,
caller_env = rlang::set_env(rlang::caller_env)
)
callr::r(
function(d, cmod) {
cmod(d)
},
args = list(
d = two_class_dat,
cmod = c_model
)
)
#> Error: ! in callr subprocess.
#> Caused by error in `.Call(ffi_ellipsis_dots, env)`:
#> ! NULL value passed as symbol address Created on 2024-05-17 with reprex v2.1.0 |
Something like this should do the trick! library(tidymodels)
library(bonsai)
library(carrier)
fit <-
workflow(
Class ~ A + B,
boost_tree("classification", engine = "lightgbm")
) %>%
fit(two_class_dat)
fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: boost_tree()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Class ~ A + B
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> LightGBM Model (100 trees)
#> Objective: binary
#> Fitted to dataset with 2 columns
c_model <- crate(
function(new_data, ...) workflows:::predict.workflow(model, new_data, ...),
model = fit
)
callr::r(
function(d, cmod) {
cmod(d)
},
args = list(
d = two_class_dat,
cmod = c_model
)
)
#> # A tibble: 791 × 1
#> .pred_class
#> <fct>
#> 1 Class1
#> 2 Class1
#> 3 Class2
#> 4 Class2
#> 5 Class1
#> 6 Class2
#> 7 Class2
#> 8 Class2
#> 9 Class1
#> 10 Class2
#> # ℹ 781 more rows Created on 2024-05-17 with reprex v2.1.0 |
Thanks for your answer, @simonpcouch! It does work out of the box. Problem is, and maybe I am missing something, I still cannot achieve what I thought at first (this is, having a function containing all of its dependencies, so an R vanilla installation on the target system could run it as-is). I have serialized the model and the data used in your example: library(tidymodels)
library(bonsai)
library(callr)
library(carrier)
library(lightgbm)
#>
#> Adjuntando el paquete: 'lightgbm'
#> The following object is masked from 'package:dplyr':
#>
#> slice
fit <-
workflow(Class ~ A + B, boost_tree("classification", engine = "lightgbm")) %>%
fit(two_class_dat)
fit
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Formula
#> Model: boost_tree()
#>
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> Class ~ A + B
#>
#> ── Model ───────────────────────────────────────────────────────────────────────
#> LightGBM Model (100 trees)
#> Objective: binary
#> Fitted to dataset with 2 columns
c_model <- crate(
function(new_data, ...) workflows:::predict.workflow(model, new_data, ...),
model = fit
)
callr::r(
function(d, cmod) {
cmod(d)
},
args = list(
d = two_class_dat,
cmod = c_model
)
)
#> # A tibble: 791 × 1
#> .pred_class
#> <fct>
#> 1 Class1
#> 2 Class1
#> 3 Class2
#> 4 Class2
#> 5 Class1
#> 6 Class2
#> 7 Class2
#> 8 Class2
#> 9 Class1
#> 10 Class2
#> # ℹ 781 more rows
saveRDS(c_model, 'model.rds')
saveRDS(two_class_dat, 'data.rds') Created on 2024-05-19 with reprex v2.1.0 Then, I have de-serialized them in a new, clean, session, without the packages installed, and I receive an error about installing the dependencies. This seems logical, and of course is more informative than the ffi-related stuff we started with, but makes me think again if what I want to achieve makes any sense. d <- readRDS('../prueba_crate/data.rds')
m <- readRDS('../prueba_crate/model.rds')
m(d)
#> Error in `check_installs()`:
#> ! This engine requires some package installs: 'lightgbm, bonsai' Created on 2024-05-19 with reprex v2.1.0 I guess, as a conclusion, that the idea I had in mind is too complex and maybe not worth the effort. As I have a renv.lock and requirements.txt files which I am using in the training phase, I think I can also reproduce the environment for the inference in the same way. Thanks a lot for your help. Anyway, if you have some tips on the feasibility of such approach, I'm all ears. :) Always happy to learn from you!!! :) Regards, |
Glad the answer was helpful! One tool that may be helpful for you as you put together your production environment; workflows (and other objects in the tidymodels) have |
Hi,
First of all, thank you for your amazing work. I have been able to log a lightgbm model in mlflow using a "crated" function. Problem is, when I load the crated model in a new, clean, session, I have problems as some of the dependencies are not there.
I have started declaring each of the dependencies that were rising errors by hand, to see if I could arrive to a decent compromise, but I am encountering some problems once I arrive to some FFI code.
Say I have a fitted workflow model...
...and I try to crate a function for prediction in the following way:
Then, if I try to call it in a clean session, it is giving me some error I just cannot understand:
Just wanted to know if I am trying to do too much (if it is possible to crate functions depending on FFI code), or if it would be better just to ensure that all needed dependencies are available in the machine that is going to run the inference part. The latter would be the easy and comfortable path, but I just wanted to know, because I think crate is a great tool, and this is not such a corner case.
Thanks a lot in advance.
Gus.
The text was updated successfully, but these errors were encountered: