forked from openxla/xla
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIN to 2023_11_29 #4
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PiperOrigin-RevId: 585379423
PiperOrigin-RevId: 585386076
PiperOrigin-RevId: 585568014
There's still room for improvement here. Ideally I think we probably want something at the level of individual reductions. Maybe we can move the ReductionCodegenState class to reduction.cc and move all functions that currently take it as an argument there. PiperOrigin-RevId: 585575236
PiperOrigin-RevId: 585594284
…upEmitter. This class doesn't need to be public. Also, it encapsulates part of the state that's needed to generate groups of reductions, so we might as well move the functions there. PiperOrigin-RevId: 585594330
Avoid unnecessary calls to can_fuse_ to save a significant amount of compile time. PiperOrigin-RevId: 585598642
Updates LLVM usage to match [5e5a22caf88a](llvm/llvm-project@5e5a22caf88a) PiperOrigin-RevId: 585609371
It's not longer used. The heuristic that used the user count was removed in cl/573215172. PiperOrigin-RevId: 585613833
…ASSIGN`. Previously, the tests assigned the wrapped map to a variable before calling `ASSERT_IS_OK` on the resulting variable, which deviates from the recommended pattern. PiperOrigin-RevId: 585618767
…digm This intends to make it easier to implement new fusion strategies, such as "Separate multiple uses of nodes within one scope when they are incompatible in Triton GEMM fusion". "Renames": AnalyzeForFusion -> GetPropagatedDimOrdersAndRequirementsIfProfitablyFusible RequireSupportedInstruction -> GetPropagatedDimOrdersAndRequirements HandleInstruction -> GetPropagatedDimOrders RequireSupportedDimOrder -> GetRequirementsIfSupportedOrder RequireSupportedDimOrders -> GetRequirementsIfSupportedOrders DimOrderUpdates -> DimOrdersAndReqs Notable logic changes: I split out the splittable_dimension_major_part_size from DotProperties to DotRequirements, because it's not really a property of the dot, but rather a requirement which can be imposed by the instructions of the fusion. I explicitly return an error if a dimension split would be needed for Softmax in GetRequirementsIfSupportedOrder. I don't check IsSupportedSplittableDimensionMajorPartSize in GetRequirementsIfSupportedOrder anymore, I just check that in CombineDimOrdersAndReqs after the propagation is done. PiperOrigin-RevId: 585650283
…paradigm " I think that now it's perhaps possible that FusionContext::GetPropagatedDimOrdersAndRequirementsIfProfitablyFusible succeeds, but context.CombineDimOrdersAndReqs fails because of a "splittable_dimension_major_part_size" requirement. Also continue is the same as break in this specific context, so I changed it to break. PiperOrigin-RevId: 585669068
…increase memory pressure much. PiperOrigin-RevId: 585669608
PiperOrigin-RevId: 585712342
…la#6973 PiperOrigin-RevId: 585717031
PiperOrigin-RevId: 585733291
PiperOrigin-RevId: 585744507
…-contracting dimensions. PiperOrigin-RevId: 585748647
…StrategyGroup to generate strategies for elementwise ops as well. PiperOrigin-RevId: 585749608
Some hlo computations were getting scheduling loops because of the alias checking. PiperOrigin-RevId: 585756812
Imported from GitHub PR openxla#7269 The size of a uint8_t is 1 so a static_assert to check that it is 0 makes no sense, fix it. Also fix a couple of warnings about lack of typename Copybara import of the project: -- e72164f by Andrew Goodbody <[email protected]>: Fix an incorrect static_assert The size of a uint8_t is 1 so a static_assert to check that it is 0 will always be false. The original intent was to have the assert only trigger if the struct was instantiated but the standard deems it ill formed if it can never be true and allows compilers to reject it. Adopt a different workaround that avoids this by allowing the possibility of an evaluation to true. Also fix a couple of warnings about lack of typename Merging this change closes openxla#7269 COPYBARA_INTEGRATE_REVIEW=openxla#7269 from elfringham:fix_assert e72164f PiperOrigin-RevId: 585767871
Imported from GitHub PR openxla#7277 This is a follow-up PR for rocm-6.0 platform support. I have also adapted two unit tests which were failing on specific architectures / platform version. @xla-rotation: would you take a look, please? Copybara import of the project: -- 12e6090 by Pavel Emeliyanenko <[email protected]>: another fixes for rocm-6.0 platform -- 9c82a7a by Pavel Emeliyanenko <[email protected]>: added checks for the failing tests -- 853e3a5 by Pavel Emeliyanenko <[email protected]>: fixing cuda compile -- 2d29cc8 by Pavel Emeliyanenko <[email protected]>: addressing reviewer comments Merging this change closes openxla#7277 COPYBARA_INTEGRATE_REVIEW=openxla#7277 from ROCmSoftwarePlatform:ci_rocm_6.0_fixes_followup 2d29cc8 PiperOrigin-RevId: 585769317
PiperOrigin-RevId: 585793365
PiperOrigin-RevId: 585800483
PiperOrigin-RevId: 585818812
PiperOrigin-RevId: 585823405
PiperOrigin-RevId: 585825723
…xla#6973 PiperOrigin-RevId: 585826616
PiperOrigin-RevId: 585832869
PiperOrigin-RevId: 586447290
PiperOrigin-RevId: 586452165
… serial-resource conflicts. This CL adds the new rule kLessSerialResourceConflict, which encourages picking the instruction whose serial resource conflict is smaller than the other alternative. The conflict is computed as the sum of the number of conflicting resources in flight. The new rule is placed after kLessStall, which means if a conflicting instruction creates less stall than a non-conflicting instruction, it will still be picked. The new rule is useful when there are enough non-resource-conflicting zero-stall alternatives that we can overlap the async collectives with. PiperOrigin-RevId: 586452690
name old cpu/op new cpu/op delta BM_BufferArgX1 6.76ns ± 3% 6.72ns ± 1% ~ (p=0.518 n=19+16) BM_BufferArgX4 13.0ns ± 9% 12.9ns ± 8% ~ (p=0.092 n=18+20) BM_TupleOfI32Attrs 45.1ns ± 1% 43.3ns ± 1% -3.90% (p=0.000 n=16+17) name old time/op new time/op delta BM_BufferArgX1 6.76ns ± 3% 6.72ns ± 1% ~ (p=0.529 n=19+16) BM_BufferArgX4 13.0ns ± 9% 12.9ns ± 8% ~ (p=0.093 n=18+20) BM_TupleOfI32Attrs 45.1ns ± 1% 43.3ns ± 1% -3.90% (p=0.000 n=16+17) PiperOrigin-RevId: 586467144
PiperOrigin-RevId: 586471723
PiperOrigin-RevId: 586471778
PiperOrigin-RevId: 586474576
…Executor to XLA:GPU level This is a high level XLA implementation detail that should not leak deep into StreamExecutor PiperOrigin-RevId: 586476378
PiperOrigin-RevId: 586478058
PiperOrigin-RevId: 586480698
PiperOrigin-RevId: 586482573
PiperOrigin-RevId: 586483855
…opies. PiperOrigin-RevId: 586488071
PiperOrigin-RevId: 586489430
PiperOrigin-RevId: 586491456
Updates LLVM usage to match [f688e0901213](llvm/llvm-project@f688e0901213) PiperOrigin-RevId: 586494799
PiperOrigin-RevId: 586498567
…7360 PiperOrigin-RevId: 586502257
…ivePipeliner. Fix an apparently missing callback if --xla_gpu_enable_pipelined_p2p is enabled. PiperOrigin-RevId: 586507418
PiperOrigin-RevId: 586510253
PiperOrigin-RevId: 586512876
PiperOrigin-RevId: 586517769
PiperOrigin-RevId: 586519546
ApsarasX
pushed a commit
that referenced
this pull request
Apr 8, 2024
Currently we look for ptxas and nvlink in a few different places on the host machine, then we choose the first found binary without taking its version into account. If the chosen binary doesn't fulfill our version requirements we will later fail even if there was a suitable ptxas or nvlink in the search path in the first place. This change makes it take the version of each binary into account when going through the search path. Unsuitable binaries will be discarded right away and the search continues until we are out of locations to check. This should help with host environments that have multiple CUDA toolkits installed and should make ptxas and nvlink selection more robust. The concreate changes: 1. `FindCudaExecutable` now also takes a minimum version and a list of forbidden (think buggy) versions that are supposed to be skipped. 2. `WarnIfBadPtxAsVersion` has been removed. It was checking for ptxas < 11.1 which is way older than our minimum supported version of 11.8 and was not doing anything given the check described in #3. 3. There was another version check for `ptxas` in `NVPTXCompiler::ChooseLinkingMethod` which was checking for `version(ptxas)` < 11.8. This has also been removed/replace by the version check described in #4. 4. Version checking for `ptxas` and `nvlink` has been consolidated into 2 methods `FindPtxAsExectuable` and `FindNvLinkExecutable`. These methods hard code the current minimum version (and the list of excluded versions) of each tool in one place. It's still not great but at least less spaghetti-like. PiperOrigin-RevId: 618797392
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
OPENXLA_PIN_COMMIT=8744c9a94782cd7804f015e6d29df253437af3cb