Runtime in half precision #692

Hrovatin · 2024-06-13T13:04:25Z

I tried running ESM2 inference (model(seq_tokens).logits) in full or half-precision (model.half()) on Apple M3 Max Chip and torch==2.3.1.
I noticed that if I use half-precision the inference time is ~10x longer (while the memory drops as expected) - any idea why the runtime increases so drastically?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime in half precision #692

Runtime in half precision #692

Hrovatin commented Jun 13, 2024

Runtime in half precision #692

Runtime in half precision #692

Comments

Hrovatin commented Jun 13, 2024