Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some R code cannot be executed normally #6163

Open
DmdCalvinChen opened this issue Jan 30, 2025 · 10 comments
Open

Some R code cannot be executed normally #6163

DmdCalvinChen opened this issue Jan 30, 2025 · 10 comments
Labels
area: remote host info needed Waiting on information

Comments

@DmdCalvinChen
Copy link

DmdCalvinChen commented Jan 30, 2025

This s a normal ANN model training code, but the cv_model code cannot execute properly—it gets stuck with no response, and this issue can be reproduced every time. However, it runs normally on RStudio Server. I'm not sure where the scheduling problem lies.

library(neuralnet)
library(NeuralNetTools)
library(caret)
library(pROC)
library(foreach)
library(doParallel)

load("ann_arg.rda")
exp_ann <- exp_vein

set.seed(20240118)

n_samples <- ncol(exp_ann)

train_index <- sample(1:n_samples, size = round(0.7 * n_samples))

train_group_list <- group_list[train_index]
test_group_list <- group_list[-train_index]

traindata <- exp_ann[, train_index]
testdata <- exp_ann[, -train_index]

k <- traindata

k <- as.data.frame(t(k))
fen <- as.data.frame(train_group_list)
fen$lasso <- ifelse(fen$train_group_list == "healthy", 0, 1)

k <- cbind(fen$lasso, k)
colnames(k)[1] <- "group"
data <- k

head(data)

groups <- data$group
gene_expression <- data[, -which(names(data) == "group")]

gene_expression <- data[, -which(names(data) == "group")]

model_data <- data.frame(group = as.factor(groups), gene_expression)

str(model_data)

formula <- as.formula(paste("group ~", paste(colnames(model_data)[colnames(model_data) != "group"], collapse = " + ")))

ann_model <- neuralnet(
formula,
data = model_data,
hidden = c(10, 5),
linear.output = FALSE,
err.fct = "ce",
lifesign = "full",
rep = 1,
algorithm = "rprop+",
stepmax = 1e+08,
)

model_data$group <- factor(model_data$group, levels = c("0", "1"), labels = c("Healthy", "Disease"))

train_control <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = twoClassSummary
)

cv_model <- train(
formula,
data = model_data,
method = "nnet",
trControl = train_control,
metric = "ROC",
tuneLength = 10
)

It gets stuck permanently at this step.

@timtmok
Copy link
Contributor

timtmok commented Jan 30, 2025

Thank you for opening an issue. Your feedback will be helpful!

How large is this dataset? Is it expected to complete quickly and how long does it take when running on RStudio Server?

@timtmok timtmok added the info needed Waiting on information label Jan 30, 2025
@DmdCalvinChen
Copy link
Author

DmdCalvinChen commented Feb 2, 2025

@timtmok Thank you for your reply, the dataset is not large, the memory usage is almost negligible relative to the amount of memory on the server, and it only takes 20s for rstudio server to complete. The point is that when this step is executed, the execution information will pop up in the console, but Positron will not pop up the execution information, and it seems completely stuck

@DmdCalvinChen
Copy link
Author

DmdCalvinChen commented Feb 2, 2025

@timtmok
[ann_arg.zip] (https://github.com/user-attachments/files/18631081/ann_arg.zip)
This is the data of the ann_arg.rda. Furthermore, I should mention that this issue was discovered while using remote SSH

@seeM
Copy link
Contributor

seeM commented Feb 3, 2025

@DmdCalvinChen thank you for the extra info! Could you please share the R kernel debug logs? That would help us investigate further.

  1. Set the positron.r.kernel.logLevel setting to "debug":

    Image
  2. Restart Positron.

  3. Rerun the script.

  4. Run the "Interpreter: Show interpreter output" command.

  5. Copy and paste the results here (you can wrap it in <details>...</details> to make the text collapsible), or upload it as a file if it's too large.

@DmdCalvinChen
Copy link
Author

DmdCalvinChen commented Feb 3, 2025

@seeM Sorry for my ignorance.Is that the date you need?I have totally no idea.

console.txt

@seeM
Copy link
Contributor

seeM commented Feb 3, 2025

Thanks @DmdCalvinChen! That's helpful but not what I was looking for. If you click the dropdown to choose a different output:

Image

Then select the output for the same R version but ending in "Kernel", that should get the logs we need.

Here's what a snippet of my logs looks like, for example:

[R]   2025-02-03T15:47:21.350790Z  INFO  Received shell notification: JupyterMessage { zmq_identities: [[114, 45, 55, 97, 57, 101, 52, 57, 53, 53], [114, 45, 55, 97, 57, 101, 52, 57, 53, 53]], header: JupyterHeader { msg_id: "1a9b280b-c3e5-47b2-b67e-a625ed7be0db", session: "r-7a9e4955", username: "seem", date: "2025-02-03T15:47:21.350Z", msg_type: "comm_msg", version: "5.3" }, parent_header: None, content: CommWireMsg { comm_id: "positron-variables-r-10-d8cd8a66", data: Object {"jsonrpc": String("2.0"), "method": String("list")} } }
[R]     at crates/amalthea/src/socket/shell.rs:232
[R] 
[R]   2025-02-03T15:47:21.351061Z  INFO  Environment: Received message from frontend: Rpc("1a9b280b-c3e5-47b2-b67e-a625ed7be0db", Object {"jsonrpc": String("2.0"), "method": String("list")})
[R]     at crates/ark/src/variables/r_variables.rs:158
[R] 

@juliasilge
Copy link
Contributor

@DmdCalvinChen can you try out some simpler code to see if the problem is using doParallel with a remote SSH session. For example, can you execute this with your setup?

library(doParallel)
registerDoParallel(cores=2)
foreach(i=1:3) %dopar% sqrt(i)

If that works fine, can you run a simpler caret example, to see if the problem is related to that? For example, can you execute this with your setup?

library(caret)
data(iris)
TrainData <- iris[,1:4]
TrainClasses <- iris[,5]

knnFit1 <- train(TrainData, TrainClasses,
                 method = "knn",
                 preProcess = c("center", "scale"),
                 tuneLength = 10,
                 trControl = trainControl(method = "cv"))

If that runs fine, then you could try this with parallel processing.

If you can find the smallest example that exhibits your example, that would be really helpful. You may find the ideas from reprex, especially Reprex do's and don'ts, helpful. Thanks! 🙌

@DmdCalvinChen
Copy link
Author

@juliasilge Thank you for your reply and efforts. Following your suggestion, I copied the information from the R4.4.0 kernel during the lagging state and named it "R4.4.0Kernel.txt." The other file contains the kernel information of the code you provided, which is running well.

R4.4.0_YOU_example_code.txt

R4.4.0Kernal.txt

@juliasilge
Copy link
Contributor

Thanks! Can you help us find more specifics about when you are observing this problem?

  • Do you see the problem when you run a simple caret example with parallel processing?
  • Do you see the problem when you run the neural net example but without parallel processing?
  • Do you see the problem when you run a simpler neural net example (for example, with iris or mtcars)?

@DmdCalvinChen
Copy link
Author

DmdCalvinChen commented Feb 6, 2025

@juliasilge Please don't close this issue, I need to squeeze out some time to study this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: remote host info needed Waiting on information
Projects
None yet
Development

No branches or pull requests

4 participants