Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Problems with Spark-perf #99

Open
yodha12 opened this issue Jan 29, 2016 · 0 comments
Open

Memory Problems with Spark-perf #99

yodha12 opened this issue Jan 29, 2016 · 0 comments

Comments

@yodha12
Copy link

yodha12 commented Jan 29, 2016

I am running pyspark tests on cluster with 12 node, 20 cores on each nodes and 60gb memory per node. I am getting output of first few tests(sort, agg, count etc), and when it reaches to broadcast, job terminates. I assume it is because of lack of memory from .err file in result folder as ensureFreeSpace(4194304) called with curMem=610484012, maxMem=611642769. How can I increase maxMem value? This is my config/config.py file content.

COMMON_JAVA_OPTS = [
# Fraction of JVM memory used for caching RDDs.
JavaOptionSet("spark.storage.memoryFraction", [0.66]),
JavaOptionSet("spark.serializer", ["org.apache.spark.serializer.JavaSerializer"]),
JavaOptionSet("spark.executor.memory", ["9g"]),
and

Set driver memory here

SPARK_DRIVER_MEMORY = "20g"

It shows the running command as follows.

Setting env var SPARK_SUBMIT_OPTS: -Dspark.storage.memoryFraction=0.66 -Dspark.serializer=org.apache.spark.serializer.JavaSerializer -Dspark.executor.memory=9g -Dspark.locality.wait=60000000 -Dsparkperf.commitSHA=unknown
Running command: /nfs/15/soottikkal/local/spark-1.5.2-bin-hadoop2.6//bin/spark-submit --master spark://r0111.ten.osc.edu:7077 pyspark-tests/core_tests.py BroadcastWithBytes --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 --broadcast-size=209715200 1>> results/python_perf_output__2016-01-28_23-35-54_logs/python-broadcast-w-bytes.out 2>> results/python_perf_output__2016-01-28_23-35-54_logs/python-broadcast-w-bytes.err

Is the spark-submit command taking memory as set in config.py here? maxMem is only 611mb which looks like 0.66*1gb of default memory setting of Spark. Changing spark.executor.memory or SPARK_DRIVER_MEMORY value in config/config.py has no effect on maxMem, but changing spark.storage.memoryFraction from 0.66 to 0.88 increases the MaxMem. How can I control maxMem value to get large memories that are already available in the cluster?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant