[Bug] HybridCache not subscriptable #1047

hudson-ai · 2024-10-11T06:26:55Z

Build currently fails due to gemma2's usage of a HybridCache which doesn't support tuple slicing like the friendlier DynamicCache.

"Fixing" this issue (just throwing the cache away...) immediately uncovered another one -- the HybridCache has a maximum size. If we don't set this manually, it is set to the sequence length of the first token sequence the model is called with. Trying to do another forward pass with more tokens leads to exceptions deep down inside of gemma's implementation. Current "fix" is to again... throw the cache away.

Hoping for something more elegant. But I don't think this is too insane for now.

Note: now taking advantage of Cache.crop for cache implementations that support it. This should prevent conversion back and forth from the "legacy" cache format that we previously assumed. (Should fix #986).

hudson-ai · 2024-10-11T06:28:15Z

guidance/models/transformers/_transformers.py

+            # TODO: this seems to get set to the length of the first sequence we pass for models using
+            # StaticCache or HybridCache. We need to initialize our own cache with a large enough size
+            # if we want to continue generation with the same cache.
+            self._past_key_values = None
+            past_length = 0


Not sure if this is necessary for SlidingWindowCache?

codecov-commenter · 2024-10-11T06:35:05Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 31.25000% with 22 lines in your changes missing coverage. Please review.

Project coverage is 63.53%. Comparing base (917fe35) to head (8d17370).

Files with missing lines	Patch %	Lines
guidance/models/transformers/_transformers.py	31.25%	22 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1047      +/-   ##
==========================================
- Coverage   71.96%   63.53%   -8.43%     
==========================================
  Files          63       63              
  Lines        4769     4797      +28     
==========================================
- Hits         3432     3048     -384     
- Misses       1337     1749     +412

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hudson-ai · 2024-10-11T20:19:40Z

Passing all the model specific tests in the CI Tests workflow. General tests failing due to azurecloud auth issues -- not sure if I am able to rerun with the right perms. But I believe all is fine and dandy with the PR.

hudson-ai · 2024-10-11T20:24:57Z

Changes since submitting PR:

When we overflow the size of the cache, reallocate a cache with double the size rather than deleting the old cache, which would cause the next forward pass to only create a cache that is "just big enough".
- To be seen whether we need to be a bit more conservative here to avoid OOM issues when the sequence length is long.
- We only do this for Static and Hybrid caches. Other cache types will be deleted until we write implementations for doubling their sizes. We give a warning in this case.
When we need to reset the cache due to backtracking, we now call Cache.reset if possible in order to avoid reallocating the cache. This also prevents the cache-doubling from resetting.

hudson-ai · 2024-10-16T16:21:06Z

@paulbkoch any feedback?

riedgar-ms

OK with me if it makes the tests work better. But @paulbkoch likely understands the details better

hudson-ai · 2024-12-16T15:22:12Z

@Harsha-Nori @paulbkoch would appreciate both sets of your eyes on this before merging -- may be necessary to pass CI for build

hudson-ai added 3 commits October 10, 2024 16:09

reset cache

55f55a1

use crop when we can, get past_length via get_seq_length

94c61a1

additionally reset cache if we are overflowing it...

8e20e09

hudson-ai requested a review from paulbkoch October 11, 2024 06:26

hudson-ai commented Oct 11, 2024

View reviewed changes

hudson-ai added 3 commits October 11, 2024 11:39

use built-in reset method when available

78dbb79

Double cache size when we can instead of just resetting

a5d5e75

remap hf_device_map

99fbdb9

Merge branch 'main' into transformers_cache

8d17370

riedgar-ms approved these changes Nov 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] HybridCache not subscriptable #1047

[Bug] HybridCache not subscriptable #1047

hudson-ai commented Oct 11, 2024

hudson-ai Oct 11, 2024 •

edited

Loading

codecov-commenter commented Oct 11, 2024 •

edited

Loading

hudson-ai commented Oct 11, 2024

hudson-ai commented Oct 11, 2024

hudson-ai commented Oct 16, 2024

riedgar-ms left a comment

hudson-ai commented Dec 16, 2024

[Bug] HybridCache not subscriptable #1047

Are you sure you want to change the base?

[Bug] HybridCache not subscriptable #1047

Conversation

hudson-ai commented Oct 11, 2024

hudson-ai Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Oct 11, 2024 • edited Loading

Codecov Report

hudson-ai commented Oct 11, 2024

hudson-ai commented Oct 11, 2024

hudson-ai commented Oct 16, 2024

riedgar-ms left a comment

Choose a reason for hiding this comment

hudson-ai commented Dec 16, 2024

hudson-ai Oct 11, 2024 •

edited

Loading

codecov-commenter commented Oct 11, 2024 •

edited

Loading