-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GPU] Enable weightless cache with precision conversion (#27742)
### Details: This change makes constants which undergo precision conversion during transformation pipeline or graph optimization eligible for weightless caching. Information about precision conversion which happened before export to cache is recorded in the cache file. During the import from cache, functionally equivalent conversions are performed. Besides the unit tests in model_cache.cpp I tested accuracy and performance of llama-2-7b-chat with FP16 inference mode. Performance impact (weightless caching is OPTIMIZE_SIZE): | OPTIMIZE_SPEED | OPTIMIZE_SIZE -- | -- | -- FP16 model import, no cache | 25.4 s | 13.6 s FP16 model import, cache exists | 6.2 s | 6.4 s FP32 model import, no cache | 57.6 | 45.8 s FP32 model import, cache exists | 8.5 s | 15.2 s | OPTIMIZE_SPEED | OPTIMIZE_SIZE -- | -- | -- FP16 model cache size | 13 GB | 6.1 MB FP32 model cache size | 13 GB | 6.2 MB Model import time is the measurement of from_pretrained() call when running the llama model with openvino.genai/tools/llm_bench tool. Question to reviewers: I'm unsure if the condition in ov::WeightlessCacheAttribute::is_copyable() is not too lenient. Specifically, I'm thinking of a scenario where a single complex transformation changes constant's data type AND something else at the same time. This would render the constant eligible for weightless caching even though the reconstruction of transformations during the cache load is not aware of anything besides the data type change (which would break the feature). Does such complex transformation exist? ### Tickets: - CVS-157081
- Loading branch information
1 parent
8e7ff7b
commit 59984e9
Showing
10 changed files
with
375 additions
and
108 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.