Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run demo of sona latest version bug #62

Open
lcx517 opened this issue Dec 26, 2019 · 1 comment
Open

run demo of sona latest version bug #62

lcx517 opened this issue Dec 26, 2019 · 1 comment

Comments

@lcx517
Copy link

lcx517 commented Dec 26, 2019

Hi, I'm running SONA-example,and got FAILED with stdout log here.
PLEASE HELP~~

2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for TERM
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for HUP
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for INT
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(deepthought); groups with view permissions: Set(); users  with modify permissions: Set(deepthought); groups with modify permissions: Set()
2019-12-26 14:09:20 INFO  UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5
2019-12-26 14:09:20 INFO  ApplicationMaster:54 - Preparing Local resources
2019-12-26 14:09:22 WARN  Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001
2019-12-26 14:09:28 INFO  AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis.
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Starting the user application in a separate Thread
2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception: 
java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491)
	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples)
2019-12-26 14:09:28 INFO  ShutdownHookManager:54 - Shutdown hook called

my SONA-example script:

source ./spark-on-angel-env.sh
export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop

$SPARK_HOME/bin/spark-submit \
        --master yarn-cluster \
        --driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \
        --keytab /home/deepthought/deepthought.keytab \
        --principal deepthought \
        --queue longyuan.p0 \
	--conf spark.ps.jars=$SONA_ANGEL_JARS \
	--conf spark.ps.instances=10 \
	--conf spark.ps.cores=2 \
	--conf spark.ps.memory=6g \
	--jars $SONA_SPARK_JARS\
	--name "LR-spark-on-angel" \
	--files /data/angel/sona-0.1.0-bin/jsons/logreg.json \
	--driver-memory 10g \
	--num-executors 10 \
	--executor-cores 2 \
	--executor-memory 4g \
	--class org.apache.spark.angel.examples.JsonRunnerExamples \
	./../lib/angelml-${SONA_VERSION}.jar \
	data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \
	modelPath:viewfs://hadoop-bd/user/deepthought/test/output \
	jsonFile:./lr.json \
	lr:0.1

and my spark-on-angel-env.sh:

export JAVA_HOME=/usr
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6
export SONA_HOME=/data/angel/sona-0.1.0-bin
export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin
export SONA_VERSION=0.1.0
export ANGEL_VERSION=3.0.1
export ANGEL_UTILS_VERSION=0.1.1
export ANGEL_MLCORE_VERSION=0.1.2

...<not changed default content below>...```
@PayneJoe
Copy link

Hi, I'm running SONA-example,and got FAILED with stdout log here.
PLEASE HELP~~

2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for TERM
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for HUP
2019-12-26 14:09:19 INFO  SignalUtils:54 - Registered signal handler for INT
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls to: deepthought
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing view acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-12-26 14:09:19 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(deepthought); groups with view permissions: Set(); users  with modify permissions: Set(deepthought); groups with modify permissions: Set()
2019-12-26 14:09:20 INFO  UserGroupInformation:964 - Login successful for user deepthought using keytab file deepthought.keytab-4169bc48-f895-42c2-9dde-091feb49f3c5
2019-12-26 14:09:20 INFO  ApplicationMaster:54 - Preparing Local resources
2019-12-26 14:09:22 WARN  Client:677 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - ApplicationAttemptId: appattempt_1576380960005_2467808_000001
2019-12-26 14:09:28 INFO  AMCredentialRenewer:54 - Scheduling login from keytab in 64776907 millis.
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Starting the user application in a separate Thread
2019-12-26 14:09:28 ERROR ApplicationMaster:91 - Uncaught exception: 
java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:715)
	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:491)
	at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
	at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
2019-12-26 14:09:28 INFO  ApplicationMaster:54 - Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.lang.ClassNotFoundException: org.apache.spark.angel.examples.JsonRunnerExamples)
2019-12-26 14:09:28 INFO  ShutdownHookManager:54 - Shutdown hook called

my SONA-example script:

source ./spark-on-angel-env.sh
export HADOOP_CONF_DIR=/usr/lib/hadoop/etc/hadoop

$SPARK_HOME/bin/spark-submit \
        --master yarn-cluster \
        --driver-java-options "-Djava.library.path=/usr/lib/hadoop/lib/native" \
        --keytab /home/deepthought/deepthought.keytab \
        --principal deepthought \
        --queue longyuan.p0 \
	--conf spark.ps.jars=$SONA_ANGEL_JARS \
	--conf spark.ps.instances=10 \
	--conf spark.ps.cores=2 \
	--conf spark.ps.memory=6g \
	--jars $SONA_SPARK_JARS\
	--name "LR-spark-on-angel" \
	--files /data/angel/sona-0.1.0-bin/jsons/logreg.json \
	--driver-memory 10g \
	--num-executors 10 \
	--executor-cores 2 \
	--executor-memory 4g \
	--class org.apache.spark.angel.examples.JsonRunnerExamples \
	./../lib/angelml-${SONA_VERSION}.jar \
	data:viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin/data/angel/a9a/a9a_123d_train.libsvm \
	modelPath:viewfs://hadoop-bd/user/deepthought/test/output \
	jsonFile:./lr.json \
	lr:0.1

and my spark-on-angel-env.sh:

export JAVA_HOME=/usr
export HADOOP_HOME=/usr/lib/hadoop
export SPARK_HOME=/usr/local/spark/spark-2.3.1-bin-hadoop2.6
export SONA_HOME=/data/angel/sona-0.1.0-bin
export SONA_HDFS_HOME=viewfs://hadoop-bd/user/deepthought/test/angel/sona-0.1.0-bin
export SONA_VERSION=0.1.0
export ANGEL_VERSION=3.0.1
export ANGEL_UTILS_VERSION=0.1.1
export ANGEL_MLCORE_VERSION=0.1.2

...<not changed default content below>...```

class changed aleady, while doc is outdated!

You need to change "org.apache.spark.angel.examples.JsonRunnerExamples" to "com.tencent.angel.sona.examples.JsonRunnerExamples".

luck~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants