Optimum-NVIDIA

https://huggingface.co/blog/optimum-nvidia

They're seeing up to 128x faster inference when using Optimum-NVIDIA. We should also check whether this can be used in tandem with BetterTransformer.