Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIN to 2023_11_29 #4

Merged
merged 110 commits into from
Dec 13, 2023
Merged

PIN to 2023_11_29 #4

merged 110 commits into from
Dec 13, 2023

Conversation

wbmc
Copy link

@wbmc wbmc commented Dec 13, 2023

  • Pin version
    OPENXLA_PIN_COMMIT=8744c9a94782cd7804f015e6d29df253437af3cb

chsigg and others added 30 commits November 26, 2023 00:07
There's still room for improvement here. Ideally I think we probably
want something at the level of individual reductions. Maybe we can
move the ReductionCodegenState class to reduction.cc and move all
functions that currently take it as an argument there.

PiperOrigin-RevId: 585575236
PiperOrigin-RevId: 585594284
…upEmitter.

This class doesn't need to be public. Also, it encapsulates part of the state
that's needed to generate groups of reductions, so we might as well move the
functions there.

PiperOrigin-RevId: 585594330
Avoid unnecessary calls to can_fuse_ to save a significant amount of compile time.

PiperOrigin-RevId: 585598642
Updates LLVM usage to match
[5e5a22caf88a](llvm/llvm-project@5e5a22caf88a)

PiperOrigin-RevId: 585609371
It's not longer used. The heuristic that used the user count was removed in cl/573215172.

PiperOrigin-RevId: 585613833
…ASSIGN`.

Previously, the tests assigned the wrapped map to a variable before calling
`ASSERT_IS_OK` on the resulting variable, which deviates from the recommended
pattern.

PiperOrigin-RevId: 585618767
…digm

This intends to make it easier to implement new fusion strategies, such as "Separate multiple uses of nodes within one scope when they are incompatible in Triton GEMM fusion".

"Renames":
AnalyzeForFusion -> GetPropagatedDimOrdersAndRequirementsIfProfitablyFusible
RequireSupportedInstruction -> GetPropagatedDimOrdersAndRequirements
HandleInstruction -> GetPropagatedDimOrders
RequireSupportedDimOrder -> GetRequirementsIfSupportedOrder
RequireSupportedDimOrders -> GetRequirementsIfSupportedOrders
DimOrderUpdates -> DimOrdersAndReqs

Notable logic changes:

I split out the splittable_dimension_major_part_size from DotProperties to DotRequirements, because it's not really a property of the dot, but rather a requirement which can be imposed by the instructions of the fusion.

I explicitly return an error if a dimension split would be needed for Softmax in GetRequirementsIfSupportedOrder.

I don't check IsSupportedSplittableDimensionMajorPartSize in GetRequirementsIfSupportedOrder anymore, I just check that in CombineDimOrdersAndReqs after the propagation is done.

PiperOrigin-RevId: 585650283
…paradigm

"

I think that now it's perhaps possible that FusionContext::GetPropagatedDimOrdersAndRequirementsIfProfitablyFusible succeeds, but context.CombineDimOrdersAndReqs fails because of a "splittable_dimension_major_part_size" requirement.

Also continue is the same as break in this specific context, so I changed it to break.

PiperOrigin-RevId: 585669068
…increase memory pressure much.

PiperOrigin-RevId: 585669608
PiperOrigin-RevId: 585712342
…-contracting dimensions.

PiperOrigin-RevId: 585748647
…StrategyGroup to generate strategies for elementwise ops as well.

PiperOrigin-RevId: 585749608
Some hlo computations were getting scheduling loops because of the alias checking.

PiperOrigin-RevId: 585756812
Imported from GitHub PR openxla#7269

The size of a uint8_t is 1 so a static_assert
to check that it is 0 makes no sense, fix it.
Also fix a couple of warnings about lack of
typename
Copybara import of the project:

--
e72164f by Andrew Goodbody <[email protected]>:

Fix an incorrect static_assert

The size of a uint8_t is 1 so a static_assert
to check that it is 0 will always be false.
The original intent was to have the assert only
trigger if the struct was instantiated but the
standard deems it ill formed if it can never be
true and allows compilers to reject it. Adopt a
different workaround that avoids this by allowing
the possibility of an evaluation to true.
Also fix a couple of warnings about lack of
typename

Merging this change closes openxla#7269

COPYBARA_INTEGRATE_REVIEW=openxla#7269 from elfringham:fix_assert e72164f
PiperOrigin-RevId: 585767871
Imported from GitHub PR openxla#7277

This is a follow-up PR for rocm-6.0 platform support.
I have also adapted two unit tests which were failing on specific architectures / platform version.

@xla-rotation: would you take a look, please?
Copybara import of the project:

--
12e6090 by Pavel Emeliyanenko <[email protected]>:

another fixes for rocm-6.0 platform

--
9c82a7a by Pavel Emeliyanenko <[email protected]>:

added checks for the failing tests

--
853e3a5 by Pavel Emeliyanenko <[email protected]>:

fixing cuda compile

--
2d29cc8 by Pavel Emeliyanenko <[email protected]>:

addressing reviewer comments

Merging this change closes openxla#7277

COPYBARA_INTEGRATE_REVIEW=openxla#7277 from ROCmSoftwarePlatform:ci_rocm_6.0_fixes_followup 2d29cc8
PiperOrigin-RevId: 585769317
PiperOrigin-RevId: 585800483
hawkinsp and others added 27 commits November 29, 2023 13:47
PiperOrigin-RevId: 586438695
PiperOrigin-RevId: 586447290
PiperOrigin-RevId: 586449725
… serial-resource conflicts.

This CL adds the new rule kLessSerialResourceConflict, which encourages picking the instruction whose serial resource conflict is smaller than the other alternative. The conflict is computed as the sum of the number of conflicting resources in flight. The new rule is placed after kLessStall, which means if a conflicting instruction creates less stall than a non-conflicting instruction, it will still be picked. The new rule is useful when there are enough non-resource-conflicting zero-stall alternatives that we can overlap the async collectives with.

PiperOrigin-RevId: 586452690
name                old cpu/op   new cpu/op   delta
BM_BufferArgX1      6.76ns ± 3%  6.72ns ± 1%    ~     (p=0.518 n=19+16)
BM_BufferArgX4      13.0ns ± 9%  12.9ns ± 8%    ~     (p=0.092 n=18+20)
BM_TupleOfI32Attrs  45.1ns ± 1%  43.3ns ± 1%  -3.90%  (p=0.000 n=16+17)

name                old time/op  new time/op  delta
BM_BufferArgX1      6.76ns ± 3%  6.72ns ± 1%    ~     (p=0.529 n=19+16)
BM_BufferArgX4      13.0ns ± 9%  12.9ns ± 8%    ~     (p=0.093 n=18+20)
BM_TupleOfI32Attrs  45.1ns ± 1%  43.3ns ± 1%  -3.90%  (p=0.000 n=16+17)

PiperOrigin-RevId: 586467144
…Executor to XLA:GPU level

This is a high level XLA implementation detail that should not leak deep into StreamExecutor

PiperOrigin-RevId: 586476378
PiperOrigin-RevId: 586483855
PiperOrigin-RevId: 586489430
Updates LLVM usage to match
[f688e0901213](llvm/llvm-project@f688e0901213)

PiperOrigin-RevId: 586494799
…ivePipeliner.

Fix an apparently missing callback if --xla_gpu_enable_pipelined_p2p is enabled.

PiperOrigin-RevId: 586507418
PiperOrigin-RevId: 586517769
@wbmc wbmc merged commit 6905291 into main Dec 13, 2023
2 of 3 checks passed
ApsarasX pushed a commit that referenced this pull request Apr 8, 2024
Currently we look for ptxas and nvlink in a few different places on the host machine, then we choose the first found binary without taking its version into account. If the chosen binary doesn't fulfill our version requirements we will later fail even if there was a suitable ptxas or nvlink in the search path in the first place.

This change makes it take the version of each binary into account when going through the search path. Unsuitable binaries will be discarded right away and the search continues until we are out of locations to check.

This should help with host environments that have multiple CUDA toolkits installed and should make ptxas and nvlink selection more robust.

The concreate changes:

1. `FindCudaExecutable` now also takes a minimum version and a list of forbidden (think buggy) versions that are supposed to be skipped.
2. `WarnIfBadPtxAsVersion` has been removed. It was checking for ptxas < 11.1 which is way older than our minimum supported version of 11.8 and was not doing anything given the check described in #3.
3. There was another version check for `ptxas` in `NVPTXCompiler::ChooseLinkingMethod` which was checking for `version(ptxas)` < 11.8. This has also been removed/replace by the version check described in #4.
4. Version checking for `ptxas` and `nvlink` has been consolidated into 2 methods `FindPtxAsExectuable` and `FindNvLinkExecutable`. These methods hard code the current minimum version (and the list of excluded versions) of each tool in one place. It's still not great but at least less spaghetti-like.

PiperOrigin-RevId: 618797392
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.