Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUBLAS_STATUS_EXECUTION_FAILED when running on GPU. #22

Open
lanlanabcd opened this issue Oct 21, 2021 · 4 comments
Open

CUBLAS_STATUS_EXECUTION_FAILED when running on GPU. #22

lanlanabcd opened this issue Oct 21, 2021 · 4 comments

Comments

@lanlanabcd
Copy link

Lasertagger works well on CPU. However, when running it on GPU, I get the bug report as follows. How could I solve this problem?

‘’‘
2021-10-21 22:10:44.179433: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[{{node bert/embeddings/MatMul}}]]
[[loss/Cast/_429]]
(1) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[{{node bert/embeddings/MatMul}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hunan/fever/depend/lasertagger/predict_main.py", line 97, in
app.run(main)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/hunan/fever/depend/lasertagger/predict_main.py", line 90, in main
prediction = predictor.predict(sources)
File "/home/hunan/fever/depend/lasertagger/predict_utils.py", line 57, in predict
out = self._predictor({key: [example.features[key]] for key in keys})
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/predictor.py", line 77, in call
return self._session.run(fetches=self.fetch_tensors, feed_dict=feed_dict)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[node bert/embeddings/MatMul (defined at /miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[loss/Cast/_429]]
(1) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2
[[node bert/embeddings/MatMul (defined at /miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'bert/embeddings/MatMul':
File "/fever/depend/lasertagger/predict_main.py", line 97, in
app.run(main)
File "/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/fever/depend/lasertagger/predict_main.py", line 79, in main
tf.contrib.predictor.from_saved_model(FLAGS.saved_model), builder,
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/predictor_factories.py", line 153, in from_saved_model
config=config)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/saved_model_predictor.py", line 153, in init
loader.load(self._session, tags.split(','), export_dir)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 269, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 422, in load
**saver_kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 352, in load_graph
meta_graph_def, import_scope=import_scope, **saver_kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 517, in _import_graph_def_internal
_ProcessNewOps(graph)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in
for c_op in c_api_util.new_tf_operations(self)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3451, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init
self._traceback = tf_stack.extract_stack()
‘’’

@hhy-hook
Copy link

Lasertagger works well on CPU. However, when running it on GPU, I get the bug report as follows. How could I solve this problem?

‘’‘ 2021-10-21 22:10:44.179433: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED Traceback (most recent call last): File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn target_list, run_metadata) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2 [[{{node bert/embeddings/MatMul}}]] [[loss/Cast/_429]] (1) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2 [[{{node bert/embeddings/MatMul}}]] 0 successful operations. 0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/hunan/fever/depend/lasertagger/predict_main.py", line 97, in app.run(main) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/home/hunan/fever/depend/lasertagger/predict_main.py", line 90, in main prediction = predictor.predict(sources) File "/home/hunan/fever/depend/lasertagger/predict_utils.py", line 57, in predict out = self._predictor({key: [example.features[key]] for key in keys}) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/predictor.py", line 77, in call return self._session.run(fetches=self.fetch_tensors, feed_dict=feed_dict) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run run_metadata_ptr) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run feed_dict_tensor, options, run_metadata) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run run_metadata) File "/home/hunan/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2 [[node bert/embeddings/MatMul (defined at /miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] [[loss/Cast/_429]] (1) Internal: Blas GEMM launch failed : a.shape=(128, 2), b.shape=(2, 768), m=128, n=768, k=2 [[node bert/embeddings/MatMul (defined at /miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]] 0 successful operations. 0 derived errors ignored.

Original stack trace for 'bert/embeddings/MatMul': File "/fever/depend/lasertagger/predict_main.py", line 97, in app.run(main) File "/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/miniconda3/envs/google/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "/fever/depend/lasertagger/predict_main.py", line 79, in main tf.contrib.predictor.from_saved_model(FLAGS.saved_model), builder, File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/predictor_factories.py", line 153, in from_saved_model config=config) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/contrib/predictor/saved_model_predictor.py", line 153, in init loader.load(self._session, tags.split(','), export_dir) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func return func(*args, **kwargs) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 269, in load return loader.load(sess, tags, import_scope, **saver_kwargs) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 422, in load **saver_kwargs) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 352, in load_graph meta_graph_def, import_scope=import_scope, **saver_kwargs) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements **kwargs)) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements return_elements=return_elements) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func return func(*args, **kwargs) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def producer_op_list=producer_op_list) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 517, in _import_graph_def_internal _ProcessNewOps(graph) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/importer.py", line 243, in _ProcessNewOps for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in _add_new_tf_operations for c_op in c_api_util.new_tf_operations(self) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3561, in for c_op in c_api_util.new_tf_operations(self) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3451, in _create_op_from_tf_operation ret = Operation(c_op, self) File "/miniconda3/envs/google/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in init self._traceback = tf_stack.extract_stack() ‘’’

Have you solved this problem? I have the same problem

@ultrawalle
Copy link

ultrawalle commented Apr 29, 2022 via email

@zzuchen
Copy link

zzuchen commented May 8, 2022

hello,I met the same error as you,could you tell me how did you finally solve this problem?

@lanlanabcd
Copy link
Author

lanlanabcd commented May 8, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants