-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized. #1469
Comments
As per the log posted, the model seems to be attempting to allocate 40GB!! of GPU memory
Need to figure out what is triggering this memory allocation. @neqkir , can I request you to rerun with the env var "MIOPEN_ENABLE_LOGGING=1" and then post the resulting log file here? thanks |
It looks to me that your arrays you are working with are based on the entire corpus. Your MI100s have 32G of memory onboard, so you are blowing past that. It also looks like you have two so you'll want to be able to make the most of both of them. Take a look at a few of these guides to help you refactor your code a bit for GPU acceleration. You will likely want to create a tf.data dataset and use a mirrored strategy to make the best use of your hardware. https://www.tensorflow.org/guide/gpu |
I am facing the same issue, @neqkir were you able to solve it? I would appreciate it if you can post your solution here. |
@neqkir @reyhaneh-92 i am facing same issue, please update here if you were able to solve |
I am also facing the same issue. Moreover, the same code was working with TensorFlow 2.4 and throwing this after I upgraded TensorFlow to 2.10. |
@mdtalibahmad I was able to resolve the issue after completely uninstalling cuda, python and all dependencies and reinstalling everything with correct version. even installed update for vcpp for latest cuda. |
I'm also the same with this problem, when I use my soft code run in local computer using jupyter notebook it working but when I move my soft code to run in server and the same env but I got error. Pls help me to resolved this error thank you for your valuable time. |
Use this code after load your libraries for increase your GPU read batch size: gpus = tf.config.list_physical_devices('GPU') |
hello, what is the correct version? |
please write the versions of all dependencies work correctly for you. |
Modifying the batch size fixed the error for me. |
Did you reduced or increased the batch size ? |
Increased it. |
Hi @neqkir and others, An internal ticket has been opened for investigation on this. Thanks for reporting! |
A RNN code running well on the CPU, on the GPU getting this apparently "out of memory" error.
I run the code here https://github.com/neqkir/bible-like-text-generation/blob/main/word-based/word_rnn_bible_lstm.py
System information
in Keras/Tensorflow
The text was updated successfully, but these errors were encountered: