Open
Description
Switch Strategy - to cross_device_ops - working for more than 2 GPUs
On 4 L4s or 3 RTX-4500/4500/4000
tensorflow/tensorflow#41724 (comment)
strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice())
parallel_model.fit(x_train, y_train, epochs=25, batch_size=2048)

|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 |
| N/A 80C P0 62W / 72W | 21002MiB / 23034MiB | 58% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA L4 Off | 00000000:00:04.0 Off | 0 |
| N/A 78C P0 67W / 72W | 20994MiB / 23034MiB | 46% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA L4 Off | 00000000:00:05.0 Off | 0 |
| N/A 76C P0 67W / 72W | 20998MiB / 23034MiB | 55% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA L4 Off | 00000000:00:06.0 Off | 0 |
| N/A 75C P0 51W / 72W | 21002MiB / 23034MiB | 55% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 40306 C python 20990MiB |
| 1 N/A N/A 40306 C python 20982MiB |
| 2 N/A N/A 40306 C python 20986MiB |
| 3 N/A N/A 40306 C python 20990MiB |
+---------------------------------------------------------------------------------------+
Epoch 24/25
25/25 [==============================] - 3s 105ms/step - loss: 0.2089 - accuracy: 0.9445
Epoch 25/25
25/25 [==============================] - 3s 105ms/step - loss: 0.1559 - accuracy: 0.9592