Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertRDDEquals gives too much logging #437

Open
KenOBonnMS opened this issue Jan 12, 2025 · 3 comments
Open

assertRDDEquals gives too much logging #437

KenOBonnMS opened this issue Jan 12, 2025 · 3 comments

Comments

@KenOBonnMS
Copy link

I only want minimal output from assertRDDEquals, such as pass fail. But it floods my logs with 5761 lines of useless messages about Spark to the point where some of the messages are clipped from the beginning, when I run in IntelliJ environment. Example of noisy log as below, just the first 50 lines otherwise too lengthy even for here.

che.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 20.0456 ms
19:36:10.595 [Executor task launch worker for task 6] INFO org.apache.spark.executor.Executor - Finished task 6.0 in stage 3.0 (TID 6). 1865 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 0] INFO org.apache.spark.executor.Executor - Finished task 0.0 in stage 3.0 (TID 0). 1865 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 14] INFO org.apache.spark.executor.Executor - Finished task 14.0 in stage 3.0 (TID 14). 1865 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 9] INFO org.apache.spark.executor.Executor - Finished task 9.0 in stage 3.0 (TID 9). 1865 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 12] INFO org.apache.spark.executor.Executor - Finished task 12.0 in stage 3.0 (TID 12). 1865 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 4] INFO org.apache.spark.executor.Executor - Finished task 4.0 in stage 3.0 (TID 4). 1908 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 10] INFO org.apache.spark.executor.Executor - Finished task 10.0 in stage 3.0 (TID 10). 1908 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 1] INFO org.apache.spark.executor.Executor - Finished task 1.0 in stage 3.0 (TID 1). 1865 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 5] INFO org.apache.spark.executor.Executor - Finished task 5.0 in stage 3.0 (TID 5). 1908 bytes result sent to driver
19:36:10.595 [Executor task launch worker for task 2] INFO org.apache.spark.executor.Executor - Finished task 2.0 in stage 3.0 (TID 2). 1865 bytes result sent to driver
19:36:10.596 [Executor task launch worker for task 13] INFO org.apache.spark.executor.Executor - Finished task 13.0 in stage 3.0 (TID 13). 1822 bytes result sent to driver
19:36:10.596 [Executor task launch worker for task 8] INFO org.apache.spark.executor.Executor - Finished task 8.0 in stage 3.0 (TID 8). 1865 bytes result sent to driver
19:36:10.599 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in stage 0.0 (TID 16, 10.0.7.131, executor driver, partition 0, PROCESS_LOCAL, 7346 bytes)
19:36:10.599 [Executor task launch worker for task 16] INFO org.apache.spark.executor.Executor - Running task 0.0 in stage 0.0 (TID 16)
19:36:10.600 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 1.0 in stage 0.0 (TID 17, 10.0.7.131, executor driver, partition 1, PROCESS_LOCAL, 7346 bytes)
19:36:10.600 [Executor task launch worker for task 17] INFO org.apache.spark.executor.Executor - Running task 1.0 in stage 0.0 (TID 17)
19:36:10.601 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 2.0 in stage 0.0 (TID 18, 10.0.7.131, executor driver, partition 2, PROCESS_LOCAL, 7346 bytes)
19:36:10.601 [Executor task launch worker for task 18] INFO org.apache.spark.executor.Executor - Running task 2.0 in stage 0.0 (TID 18)
19:36:10.602 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 3.0 in stage 0.0 (TID 19, 10.0.7.131, executor driver, partition 3, PROCESS_LOCAL, 7346 bytes)
19:36:10.603 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 4.0 in stage 0.0 (TID 20, 10.0.7.131, executor driver, partition 4, PROCESS_LOCAL, 7346 bytes)
19:36:10.602 [Executor task launch worker for task 19] INFO org.apache.spark.executor.Executor - Running task 3.0 in stage 0.0 (TID 19)
19:36:10.603 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 5.0 in stage 0.0 (TID 21, 10.0.7.131, executor driver, partition 5, PROCESS_LOCAL, 7346 bytes)
19:36:10.604 [Executor task launch worker for task 20] INFO org.apache.spark.executor.Executor - Running task 4.0 in stage 0.0 (TID 20)
19:36:10.604 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 6.0 in stage 0.0 (TID 22, 10.0.7.131, executor driver, partition 6, PROCESS_LOCAL, 7346 bytes)
19:36:10.604 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 7.0 in stage 0.0 (TID 23, 10.0.7.131, executor driver, partition 7, PROCESS_LOCAL, 7567 bytes)
19:36:10.605 [Executor task launch worker for task 22] INFO org.apache.spark.executor.Executor - Running task 6.0 in stage 0.0 (TID 22)
19:36:10.605 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 8.0 in stage 0.0 (TID 24, 10.0.7.131, executor driver, partition 8, PROCESS_LOCAL, 7346 bytes)
19:36:10.605 [Executor task launch worker for task 21] INFO org.apache.spark.executor.Executor - Running task 5.0 in stage 0.0 (TID 21)
19:36:10.606 [Executor task launch worker for task 23] INFO org.apache.spark.executor.Executor - Running task 7.0 in stage 0.0 (TID 23)
19:36:10.606 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 9.0 in stage 0.0 (TID 25, 10.0.7.131, executor driver, partition 9, PROCESS_LOCAL, 7346 bytes)
19:36:10.607 [Executor task launch worker for task 25] INFO org.apache.spark.executor.Executor - Running task 9.0 in stage 0.0 (TID 25)
19:36:10.607 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 10.0 in stage 0.0 (TID 26, 10.0.7.131, executor driver, partition 10, PROCESS_LOCAL, 7346 bytes)
19:36:10.608 [Executor task launch worker for task 24] INFO org.apache.spark.executor.Executor - Running task 8.0 in stage 0.0 (TID 24)
19:36:10.608 [Executor task launch worker for task 26] INFO org.apache.spark.executor.Executor - Running task 10.0 in stage 0.0 (TID 26)
19:36:10.608 [dispatcher-event-loop-0] INFO org.apache.spark.scheduler.TaskSetManager - Starting task 11.0 in stage 0.0 (TID 27, 10.0.7.131, executor driver, partition 11, PROCESS_LOCAL, 7346 bytes)
19:36:10.610 [Executor task launch worker for task 27] INFO org.apache.spark.executor.Executor - Running task 11.0 in stage 0.0 (TID 27)
19:36:10.611 [task-result-getter-3] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 9.0 in stage 3.0 (TID 9) in 435 ms on 10.0.7.131 (executor driver) (1/16)
19:36:10.613 [task-result-getter-3] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 12.0 in stage 3.0 (TID 12) in 438 ms on 10.0.7.131 (executor driver) (2/16)
19:36:10.614 [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 0.0 in stage 3.0 (TID 0) in 454 ms on 10.0.7.131 (executor driver) (3/16)
19:36:10.614 [task-result-getter-1] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 14.0 in stage 3.0 (TID 14) in 438 ms on 10.0.7.131 (executor driver) (4/16)
19:36:10.615 [task-result-getter-0] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 6.0 in stage 3.0 (TID 6) in 441 ms on 10.0.7.131 (executor driver) (5/16)
19:36:10.616 [Executor task launch worker for task 7] INFO org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator - Code generated in 42.1484 ms
19:36:10.616 [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 10.0 in stage 3.0 (TID 10) in 441 ms on 10.0.7.131 (executor driver) (6/16)
19:36:10.617 [task-result-getter-2] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 5.0 in stage 3.0 (TID 5) in 443 ms on 10.0.7.131 (executor driver) (7/16)
19:36:10.617 [task-result-getter-0] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 2.0 in stage 3.0 (TID 2) in 445 ms on 10.0.7.131 (executor driver) (8/16)
19:36:10.617 [task-result-getter-3] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 4.0 in stage 3.0 (TID 4) in 443 ms on 10.0.7.131 (executor driver) (9/16)
19:36:10.618 [task-result-getter-0] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 8.0 in stage 3.0 (TID 8) in 443 ms on 10.0.7.131 (executor driver) (10/16)
19:36:10.618 [task-result-getter-1] INFO org.apache.spark.scheduler.TaskSetManager - Finished task 1.0 in stage 3.0 (TID 1) in 446 ms on 10.0.7.131 (executor driver) (11/16)

@holdenk
Copy link
Owner

holdenk commented Jan 12, 2025

Those are spark's internal logging messages not from spark-testing-base. We could change the default log level perhaps?

@KenOBonnMS
Copy link
Author

KenOBonnMS commented Jan 12, 2025

How to change the default log level? Below is what I've tried but doesn't make a difference.

val spark = SparkSession.builder()
.master("local[*]")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

@KenOBonnMS
Copy link
Author

It seems spark lazy execution was making it look like assertRDDEquals was causing the issue, whereas the spark logging is triggered whenever you execute the spark statement. I found this out just now by pulling the dataframe operation (actually Dataset[Row] is the type) out of the assertRDDEquals and forcing lazy spark to execute by using a trivial rendering such as .show() operation as below code snippet demonstrates..

Although this confirms issue is not from the spark-testing-base as you stated above, if you happen to know how to set the default logging to not show these messages it would be greatly appreciated. Either way, it would be fair to close this issue though hoping you have insights please.

// Assert
val expectedDFSorted: Dataset[Row] = Common.sortColumns(expectedDF)
val filterOutputDFSorted: Dataset[Row] = Common.sortColumns(filterOutputDF)
// expectedDFSorted.show(false) // This gives 119 lines of spark log execution entries
// filterOutputDFSorted.show(false) // This gives 4053 lines of spark log execution entries
assertRDDEquals(expectedDFSorted.rdd, filterOutputDFSorted.rdd)
// assertRDDEquals(Common.sortColumns(expectedDF).rdd, Common.sortColumns(filterOutputDF).rdd)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants