Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of beam search to process factor groups in parallel #772

Closed

Conversation

rhenry-nv
Copy link
Contributor

@rhenry-nv rhenry-nv commented Dec 8, 2020

Description

This PR refactors the beam search to process the secondary factors in parallel.

As is, this work significantly reduces the H2D communication required when processing the secondary factors in a model with a factored vocabulary.

Note - The changes in this PR are not integrated into PR #743. The table below shows the improvements on top of PR #770.

Times with 1 stream

Batch Initial Time (s) Current Time(s) % Runtime reduction Speedup factor
1 166.392 162.169 0.025379826 1.026040735
2 113.005 109.386 0.032025132 1.033084673
4 69.5436 66.8378 0.038907966 1.04048308
8 40.7636 39.2434 0.037293075 1.038737724
16 23.9823 23.2354 0.031143802 1.032144917
32 14.7954 14.3573 0.029610555 1.030514094
64 9.6 9.01939 0.060480208 1.064373533
128 6.48 6.04882 0.066540123 1.071283325
256 4.65 4.28963 0.077498925 1.084009577

Times with 2 streams

Batch Initial Time (s) Current Time(s) % Runtime reduction Speedup factor
1 116.7 110.365 0.05428449 1.057400444
2 78.46 74.248 0.053683406 1.056728801
4 47.83 45.4441 0.049882919 1.052501865
8 28.107 26.7974 0.046593375 1.048870413
16 16.69 15.6832 0.060323547 1.064196082
32 10.24 9.79104 0.04384375 1.045854169
64 6.57 6.22622 0.052325723 1.055214882
128 4.65 4.22329 0.091765591 1.101037343
256 3.47 3.2422 0.065648415 1.070260934

List of changes:

  • Adds a kernel to perform a max reduction on the last axis of a tensor for GPU. This cuts down on the kernel launches needed and removes a stream synchronize for every call.
  • Beam search refactor to batch secondary factors
  • Some changes from PR Small optimizations #768 to reduce index copying.

Added dependencies: cub

How to test

I ran the regression tests and tested manually with a proxy model.

CMake command: cmake .. -DCOMPILE_CPU=on -DCOMPILE_CUDA=on -DUSE_SENTENCEPIECE=on -DUSE_STATIC_LIBS=off -DCOMPILE_SERVER=off -DUSE_FBGEMM=on -DCOMPILE_CUDA_SM35=off -DCOMPILE_CUDA_SM50=off -DCOMPILE_CUDA_SM60=off -DCOMPILE_CUDA_SM70=on -DCOMPILE_CUDA_SM75=off -DCOMPILE_TESTS=on

Ubuntu - 18.04.3 LTS
nvcc - 10.1.243
gcc - 7.5.0

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

@rhenry-nv
Copy link
Contributor Author

This is broken as it does not handling forwarding hypotheses which could not be expanded by certain factor groups properly.

@rhenry-nv rhenry-nv closed this Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant