-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUBLAS_STATUS_EXECUTION_FAILED when running on GPU. #22
Comments
Have you solved this problem? I have the same problem |
您好,我是王磊,您的邮件我已收到,谢谢!
|
hello,I met the same error as you,could you tell me how did you finally solve this problem? |
I finally spend several days running it on my CPU.------------------ Original ------------------From: zzuchen ***@***.***>Date: Sun,May 8,2022 10:51 AMTo: google-research/lasertagger ***@***.***>Cc: lanlanabcd ***@***.***>, Author ***@***.***>Subject: Re: [google-research/lasertagger] CUBLAS_STATUS_EXECUTION_FAILED whenrunning on GPU. (Issue #22)
hello,I met the same error as you,could you tell me how did you finally solve this problem?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
[
{
***@***.***": "http://schema.org",
***@***.***": "EmailMessage",
"potentialAction": {
***@***.***": "ViewAction",
"target": "#22 (comment)",
"url": "#22 (comment)",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
***@***.***": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
Lasertagger works well on CPU. However, when running it on GPU, I get the bug report as follows. How could I solve this problem?
‘’‘
2021-10-21 22:10:44.179433: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[{{node bert/embeddings/MatMul}}]]
[[loss/Cast/_429]]
(1) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[{{node bert/embeddings/MatMul}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hunan/fever/depend/lasertagger/predict_main.py", line 97, in
app.run(main)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/hunan/fever/depend/lasertagger/predict_main.py", line 90, in main
prediction = predictor.predict(sources)
File "/home/hunan/fever/depend/lasertagger/predict_utils.py", line 57, in predict
out = self._predictor({key: [example.features[key]] for key in keys})
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/predictor.py", line 77, in call
return self._session.run(fetches=self.fetch_tensors, feed_dict=feed_dict)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[node bert/embeddings/MatMul (defined at /miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[loss/Cast/_429]]
(1) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[node bert/embeddings/MatMul (defined at /miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'bert/embeddings/MatMul':
File "/fever/depend/lasertagger/predict_main.py", line 97, in
app.run(main)
File "/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/fever/depend/lasertagger/predict_main.py", line 79, in main
tf.contrib.predictor.from_saved_model(FLAGS.saved_model), builder,
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/predictor_factories.py", line 153, in from_saved_model
config=config)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/saved_model_predictor.py", line 153, in init
loader.load(self._session, tags.split(','), export_dir)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 269, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 422, in load
**saver_kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 352, in load_graph
meta_graph_def, import_scope=import_scope, **saver_kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 517, in _import_graph_def_internal
_ProcessNewOps(graph)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in
for c_op in c_api_util.new_tf_operations(self)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3451, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()
‘’’
The text was updated successfully, but these errors were encountered: