cuda-1brc Can CUDA handle one billion rows of text? Yes. fast.cu takes 16.8 seconds to process one billion rows on a V100. This is a 60X speedup over a pure C++ baseline in base.cpp. Check out the blog for a detailed explanation.