-
Notifications
You must be signed in to change notification settings - Fork 462
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor async engine & turbomind IO (#2968)
* refactor * async interface * update perf metrics & adaptive tokens per tick * wait-free * refactor gateway * optimize throughput * add cancel cb * simplify async engine * simplify async engine * fix end session * faster synchronization * fix async engine * refactor async engine * fix semaphore * refactor inference API * remove turbomind sync interface * fix msvc build * fix msvc build * fix msvc build * add extra outputs * skip stop tokens * exit gracefully * cancel all tasks atexit * refactor profiler * fix id2step for api server * save csv * fix interactive * fix lint * fix generate_token_len * fix async_end * update pipeline ut * fix ignore eos * minor * refactor profile pipeline api * fix stop ids * fix duplication * control output range of logits & last hidden states * fix lint & typo * fix blank response * export batch & num prompts
- Loading branch information
Showing
41 changed files
with
2,683 additions
and
2,019 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.