-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: Legacy Workload Ports #79
Comments
Hello @ecurtin , I am wondering if the legacy version can be compatible with Spark 2.2. I need more workloads for my thesis experiment. BTW, thank you so much for taking time to answer all my questions! |
@akasaki It depends on what you mean by compatible. Both versions have data generators that output data to disk and workloads that pick up that data and do stuff with it, but they are entirely different code bases. You're totally welcome to try to the legacy version if you think it might suit your needs better, just keep in mind that it is unsupported. Are there any workloads in particular that are high priority for you? |
@ecurtin I focus on the tuning algorithm based on three types of workloads. The journal (Li M, Tan J, Wang Y, Zhang L, Salapura V. SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics. Cluster Computing. 2017:1-5.) classified all workloads into three types, memory intensive, shuffle intensive and all intensive. In the current version, SQL workload is shuffle intensive, and linear regression is memory intensive although linear regression doesn't work under my environment (Issue #134). I suppose K-means is also memory intensive, isn't it? I need one or more all-intensive workloads such as MF and SVD++. I am trying to setup the legacy version. |
SparkPi is included in the current version of Spark-Bench. It's extremely compute-intensive (when used with large parameters) while hardly making use of I/O at all. Basically it computes an approximate value of Pi in a deliberately inefficient manner: https://sparktc.github.io/spark-bench/workloads/sparkpi/ |
@ecurtin I see. I have tried it as the first example, but it doesn't have any shuffle operation. I am looking for some all-intensive (both shuffle intensive and memory intensive) workloads which consume both I/O and memory. |
Port all workloads available in legacy version to new version.
The text was updated successfully, but these errors were encountered: