Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FA's CrossEntropyLoss for metrics calculation #3214

Closed
wants to merge 13 commits into from

Conversation

snarayan21
Copy link
Contributor

@snarayan21 snarayan21 commented Apr 25, 2024

What does this PR do?

If available, we should use FA's CrossEntropyLoss for calculating the CE metric in composer. We already use it in foundry, and it's simply faster. Below we can see that the CE loss metric is the same, and there's a nice MFU boost as well.

Pink: using FA's CrossEntropyLoss
Green: using torch's CrossEntropyLoss
Screenshot 2024-04-25 at 4 47 49 PM
Screenshot 2024-04-25 at 4 48 41 PM

What issue(s) does this change relate to?

Before submitting

  • Have you read the contributor guidelines?
  • Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • Did you update any related docs and document your change?
  • Did you update any related tests and add any new tests related to your change? (see testing)
  • Did you run the tests locally to make sure they pass?
  • Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

@snarayan21 snarayan21 requested a review from a team as a code owner April 25, 2024 23:50
@dakinggg
Copy link
Contributor

#2987 this is the third time we've had this PR 😂 once in foundry, once in composer, and now again. @mvpatel2000 last time I thought we should put it in foundry. Do you want to take FA as a Composer dependency?

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea Im fine adding it. @snarayan21 please add as an optional dependency

@snarayan21 snarayan21 requested a review from a team as a code owner April 26, 2024 16:06
@snarayan21
Copy link
Contributor Author

added as optional dep

Copy link
Contributor

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

setup.py Outdated
Comment on lines 228 to 230
extra_deps['gpu-flash2'] = [
'flash-attn==2.5.0',
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why gpu- instead of jsut flash2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just be flash-attention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wbout flash-attn since its the package name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to flash-attn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya sounds good

@@ -1,6 +1,6 @@
# build requirements
[build-system]
requires = ["setuptools < 68.0.0"]
requires = ["setuptools < 68.0.0", "packaging >= 21.3.0, < 24.1"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gets around some of the installation errors, but torch is also a dependency, and then we go round and round in this circle trying to get build isolation working correctly. so i'm closing this PR, it should live foundry side.

@snarayan21 snarayan21 closed this Apr 30, 2024
@snarayan21
Copy link
Contributor Author

Closed this PR since adding flash-attn to composer as a dependency makes things very ugly. Will add this to foundry later. 4th time's the charm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants