New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Parallel sharding #21

Merged

tengomucho merged 22 commits into main from parallel-sharding

Apr 10, 2024

Commits on Apr 9, 2024

chore: update transformers dependency

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for d75ba94

Browse repository at this point
Copy the full SHA

d75ba94 View commit details

Browse the repository at this point in the history
feat: import transformer's gemma modeling code
```
It will be used to adapt it for sharding. Only imports have been
adapted, and only code relevant for GemmaForCausalLM has been added.
```
tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 0ee7430

Browse repository at this point
Copy the full SHA

0ee7430 View commit details

Browse the repository at this point in the history
chore: rename model Gemma -> TpuGemma to prepare for changes

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for ca88068

Browse repository at this point
Copy the full SHA

ca88068 View commit details

Browse the repository at this point in the history
feat(DistributedModel): added config property

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for a3de4d7

Browse repository at this point
Copy the full SHA

a3de4d7 View commit details

Browse the repository at this point in the history
chore: rename test_parallel_proxy.py -> test_distributed_model.py

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 80170a9

Browse repository at this point
Copy the full SHA

80170a9 View commit details

Browse the repository at this point in the history
fix: use AutoModelForCausalLM instead of TpuModelForCausalLM

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 9a9bcf8

Browse repository at this point
Copy the full SHA

9a9bcf8 View commit details

Browse the repository at this point in the history
feat: AutoModelForCausalLM will choose TpuGemmaForCausalLM if possible

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 5bf6c70

Browse repository at this point
Copy the full SHA

5bf6c70 View commit details

Browse the repository at this point in the history
fix(TpuGemma): avoid using device_map when loading model
```
It seems that device_map parameter triggers a chain of calls that will
try to use accelerate to load the model using less memory. The problem
is that it skips the load state pre-hooks, making the weights loading
impossible.
```
tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 9dfb7b6

Browse repository at this point
Copy the full SHA

9dfb7b6 View commit details

Browse the repository at this point in the history
feat(gemma): sharding o_proj
```
It will now be running in parallel. More changes to come.
```
tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for ec3b752

Browse repository at this point
Copy the full SHA

ec3b752 View commit details

Browse the repository at this point in the history
feat(gemma): sharding on q_proj

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for a7d7c0b

Browse repository at this point
Copy the full SHA

a7d7c0b View commit details

Browse the repository at this point in the history
feat(gemma): sharding on k and v proj

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for b6fe32e

Browse repository at this point
Copy the full SHA

b6fe32e View commit details

Browse the repository at this point in the history
feat(gemma): sharding on mlp gate and up proj

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for e13d9ec

Browse repository at this point
Copy the full SHA

e13d9ec View commit details

Browse the repository at this point in the history
feat(gemma): sharding on mlp down proj

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 6cdede2

Browse repository at this point
Copy the full SHA

6cdede2 View commit details

Browse the repository at this point in the history
feat: model il loaded using pytorch_dtype from config
```
This will lead to loading the model in bfloat16 when specified in the
config.
```
tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for cd99226

Browse repository at this point
Copy the full SHA

cd99226 View commit details

Browse the repository at this point in the history
fix: remove useless import

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 550e1fb

Browse repository at this point
Copy the full SHA

550e1fb View commit details

Browse the repository at this point in the history
feat(tests): added test showing gemma7b sharding and prefill works

tengomucho committed Apr 9, 2024
Configuration menu
View commit details

Copy full SHA for 2215595

Browse repository at this point
Copy the full SHA

2215595 View commit details

Browse the repository at this point in the history

Commits on Apr 10, 2024

chore: config_name_to_class uses config.model_type now

tengomucho committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for fe888a9

Browse repository at this point
Copy the full SHA

fe888a9 View commit details

Browse the repository at this point in the history
fix: get_generation_mode is now a method of generation_config
```
API change when transformers was updated.
```
tengomucho committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for dbf11f7

Browse repository at this point
Copy the full SHA

dbf11f7 View commit details

Browse the repository at this point in the history
fix(TGI server): fix slot.stopped changed after transformers update

tengomucho committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for a96903b

Browse repository at this point
Copy the full SHA

a96903b View commit details

Browse the repository at this point in the history
fix(generator): fix sample generation again
```
I wrongly chose the model's generation config instead of the one to the
token selector.
```
tengomucho committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for 6e6b44e

Browse repository at this point
Copy the full SHA

6e6b44e View commit details

Browse the repository at this point in the history
fix: better handle torch_dtype
```
bfloat16 will be set by default in gemma models, other models will still
load in float32 by default.
```
tengomucho committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for 92e9e31

Browse repository at this point
Copy the full SHA

92e9e31 View commit details

Browse the repository at this point in the history
fix: remove unused import

tengomucho committed Apr 10, 2024
Configuration menu
View commit details

Copy full SHA for 7901d91

Browse repository at this point
Copy the full SHA

7901d91 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel sharding #21

Parallel sharding #21

Commits on Apr 9, 2024

Commits on Apr 10, 2024