Add Comet integration #1

…nd others fields

* fix python version and pytest install * Update NeoXArgs docs automatically * python3 * Update NeoXArgs docs automatically * pip not pip3 * Update NeoXArgs docs automatically * python3 pip * Update NeoXArgs docs automatically * python3 -m pip * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * add docker setup to workflow * Update NeoXArgs docs automatically * python setup * Update NeoXArgs docs automatically * python setup v2 * Update NeoXArgs docs automatically * python setup v3 * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * python setup v3 * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Add hash back to deep speed version * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Add a chat data preprocessing script * add EOT at end of a chat * update README.md * apply pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

* - Add conversion of HF llama models to NeoX * - Add conversion of HF llama models to NeoX * - minor fix * pre-commit --------- Co-authored-by: Quentin Anthony <[email protected]>

…s_data_with_chat_template.py (EleutherAI#1258) * bugfix: chat turns instead of repeating the conversation * pre-commit

* changing from self-hosted runners to Github's ubuntu-22.04 runner environment * adding warning about not using 'self-hosted' runner labels and using Github runners instead * updated some guidance in comments for coverity scan CI * moving CPU tests to workflow_dispatch only

* first draft (shape errors occurring) * training works (but poor convergence) * debugging progress: current commit works if we do regular TP via impl-ing AR in rowparallel as RS then AG * Update NeoXArgs docs automatically * push most recent code (updated mark_norms fn, back to 'real' sequence parallel) * Update NeoXArgs docs automatically * Fix LayerNorm all reduce gradient hook * Sum instead of average for LayerNorm gradient all reduce * Update NeoXArgs docs automatically * Update NeoXArgs docs automatically * Fix gather and reduce scatter ops on sequence dimension * Fix sequence parallel with tied weight embeddings * Update NeoXArgs docs automatically * cleanup pass + add MoE arguments.py guard * pre-commit and clean up comments * remove vestigial debug code * remove unused debugging code * remove dummy test config * update fp32_allreduce to handle fp16 ; don't cast to fp32 for gathers * run linter on the rest of the files * Improve performance of sequence parallel gather, scatter, and reduce * Add comment * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Brandon Yang <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* Update README.md I added new models that have come out trained with the GPT-NeoX library. The library itself is sufficiently well-used that simply listing all citing papers is rapidly becoming non-viable. I'm currently leaning towards providing a curated list of "exciting" papers? I haven't looked at other libraries to see what they do yet. * Update NeoXArgs docs automatically --------- Co-authored-by: github-actions <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

* mamba fixes and cleaning * space * revert assertion change for now --------- Co-authored-by: Jacob Hatef <[email protected]>

…leutherAI#1240) * - add different packing impl (Unpacked, packing until overflow) - fix labels to also have valid/test implementations - fix label masking in _get_batch to also include anything from get_ltor_masks_and_position_ids * Update arguments.py to use train_label_data_paths instead of label_data_paths * - fix precommit

* Update transformer.py -> Add `intermediate_size` * add support for rwkv and mamba and add todos about swiglu * refactor activations and mlps * change llama config to swiglu * fixes gelu fusion * pre-commit run * add assert message to mamba linear * Update 1-3B.yml revert accidental change * Update 1-3B.yml * fixes various issues * add back swiglu check --------- Co-authored-by: jahatef <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Jacob Hatef <[email protected]>

…rAI#1270)

* Add a chat data preprocessing script * add EOT at end of a chat * - add different packing impl (Unpacked, packing until overflow) - fix labels to also have valid/test implementations - fix label masking in _get_batch to also include anything from get_ltor_masks_and_position_ids * update README.md * - Add metrics to forward step to add DPO specific metrics that are useful (accuracy, etc) - Add reference model setup for DPO - Add pairwise dataset for positive/negative pairs - Add DPO loss * Update arguments.py to use train_label_data_paths instead of label_data_paths * - Bugfixes from upstreaming.... * - add precompute logprobs... * - Finishing up precompute logprobs... * - update readme for DPO... * fix varname * Fix pipeline parallelism and incorrect neox_args name * apply precommit --------- Co-authored-by: Quentin Anthony <[email protected]>

* Add TE skeleton * Update NeoXArgs docs automatically * added option for te version of norms * import TERMSNorm * add te norm options to norm arg * add TE objects in weight decay function * reformat * add TERMSNorm and TELayerNorm * Update NeoXArgs docs automatically * - add Fused RMS Norm from apex * - make it consistent with how layernorm looks * Merged transformer engine and apex fused layernorm branches * Added assertion if TE is used * Removed unnecessary transformer-engine import * Changed importerror text for TE * Added requirements/requirements-transformerengine.txt * Add TE skeleton * Update NeoXArgs docs automatically * added option for te version of norms * import TERMSNorm * add te norm options to norm arg * add TE objects in weight decay function * reformat * add TERMSNorm and TELayerNorm * Update NeoXArgs docs automatically * - add Fused RMS Norm from apex * - make it consistent with how layernorm looks * Merged transformer engine and apex fused layernorm branches * Added assertion if TE is used * Removed unnecessary transformer-engine import * Changed importerror text for TE * Added requirements/requirements-transformerengine.txt * update comments * precommit --------- Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: lintangsutawika <lintang@stella-ord-0.stella-ord.tenant-eleutherai.svc.tenant.chi.local> Co-authored-by: lintangsutawika <[email protected]> Co-authored-by: dmahan93 <[email protected]> Co-authored-by: aurelion-source <[email protected]> Co-authored-by: aurelion-source <[email protected]>

* fix the te import * refactor get_params_for_weight_decay_optimization * remove incorrect type hint and dead imports

* Add a chat data preprocessing script * add EOT at end of a chat * - add different packing impl (Unpacked, packing until overflow) - fix labels to also have valid/test implementations - fix label masking in _get_batch to also include anything from get_ltor_masks_and_position_ids * update README.md * - Add metrics to forward step to add DPO specific metrics that are useful (accuracy, etc) - Add reference model setup for DPO - Add pairwise dataset for positive/negative pairs - Add DPO loss * Update arguments.py to use train_label_data_paths instead of label_data_paths * - Bugfixes from upstreaming.... * - add precompute logprobs... * - Finishing up precompute logprobs... * - update readme for DPO... * - Add RM training * add comment on why row-parallel for RMs * fix var name --------- Co-authored-by: Quentin Anthony <[email protected]>

…nd others fields

into comet-integration

Commits on Sep 3, 2024

pre-commit and logging cleanup

Quentin-Anthony committed Sep 3, 2024

Configuration menu

View commit details

Copy full SHA for 6a2053b

Browse repository at this point

Copy the full SHA

6a2053b View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Comet integration #1

Add Comet integration #1

Commits on Jun 11, 2024

Commits on Jun 13, 2024

Commits on Jun 19, 2024

Commits on Jun 25, 2024

Commits on Jun 28, 2024

Commits on Aug 6, 2024

Commits on Aug 15, 2024

Commits on Aug 23, 2024

Commits on Aug 24, 2024

Commits on Aug 27, 2024

Commits on Sep 3, 2024

Commits on Sep 5, 2024

Commits on Sep 7, 2024

Commits on Sep 8, 2024

Commits on Sep 9, 2024

Add Comet integration #1

Are you sure you want to change the base?

Add Comet integration #1

Commits on Jun 11, 2024

Commits on Jun 13, 2024

Commits on Jun 19, 2024

Commits on Jun 25, 2024

Commits on Jun 28, 2024

Commits on Aug 6, 2024

Commits on Aug 15, 2024

Commits on Aug 23, 2024

Commits on Aug 24, 2024

Commits on Aug 27, 2024

Commits on Sep 3, 2024

Commits on Sep 5, 2024

Commits on Sep 7, 2024

Commits on Sep 8, 2024

Commits on Sep 9, 2024