PIN to 2023_11_29 #4

wbmc · 2023-12-13T04:40:03Z

Pin version
OPENXLA_PIN_COMMIT=8744c9a94782cd7804f015e6d29df253437af3cb

PiperOrigin-RevId: 585379423

PiperOrigin-RevId: 585386076

PiperOrigin-RevId: 585568014

There's still room for improvement here. Ideally I think we probably want something at the level of individual reductions. Maybe we can move the ReductionCodegenState class to reduction.cc and move all functions that currently take it as an argument there. PiperOrigin-RevId: 585575236

PiperOrigin-RevId: 585594284

…upEmitter. This class doesn't need to be public. Also, it encapsulates part of the state that's needed to generate groups of reductions, so we might as well move the functions there. PiperOrigin-RevId: 585594330

Avoid unnecessary calls to can_fuse_ to save a significant amount of compile time. PiperOrigin-RevId: 585598642

Updates LLVM usage to match [5e5a22caf88a](llvm/llvm-project@5e5a22caf88a) PiperOrigin-RevId: 585609371

It's not longer used. The heuristic that used the user count was removed in cl/573215172. PiperOrigin-RevId: 585613833

…ASSIGN`. Previously, the tests assigned the wrapped map to a variable before calling `ASSERT_IS_OK` on the resulting variable, which deviates from the recommended pattern. PiperOrigin-RevId: 585618767

http://github.com/tensorflow/runtime/commit/58f2ec4dc891dc0bc0815a8c2d1caf196bfc13d5. PiperOrigin-RevId: 585619228

…digm This intends to make it easier to implement new fusion strategies, such as "Separate multiple uses of nodes within one scope when they are incompatible in Triton GEMM fusion". "Renames": AnalyzeForFusion -> GetPropagatedDimOrdersAndRequirementsIfProfitablyFusible RequireSupportedInstruction -> GetPropagatedDimOrdersAndRequirements HandleInstruction -> GetPropagatedDimOrders RequireSupportedDimOrder -> GetRequirementsIfSupportedOrder RequireSupportedDimOrders -> GetRequirementsIfSupportedOrders DimOrderUpdates -> DimOrdersAndReqs Notable logic changes: I split out the splittable_dimension_major_part_size from DotProperties to DotRequirements, because it's not really a property of the dot, but rather a requirement which can be imposed by the instructions of the fusion. I explicitly return an error if a dimension split would be needed for Softmax in GetRequirementsIfSupportedOrder. I don't check IsSupportedSplittableDimensionMajorPartSize in GetRequirementsIfSupportedOrder anymore, I just check that in CombineDimOrdersAndReqs after the propagation is done. PiperOrigin-RevId: 585650283

…paradigm " I think that now it's perhaps possible that FusionContext::GetPropagatedDimOrdersAndRequirementsIfProfitablyFusible succeeds, but context.CombineDimOrdersAndReqs fails because of a "splittable_dimension_major_part_size" requirement. Also continue is the same as break in this specific context, so I changed it to break. PiperOrigin-RevId: 585669068

…increase memory pressure much. PiperOrigin-RevId: 585669608

PiperOrigin-RevId: 585712342

…la#6973 PiperOrigin-RevId: 585717031

PiperOrigin-RevId: 585733291

PiperOrigin-RevId: 585744507

…-contracting dimensions. PiperOrigin-RevId: 585748647

…StrategyGroup to generate strategies for elementwise ops as well. PiperOrigin-RevId: 585749608

Some hlo computations were getting scheduling loops because of the alias checking. PiperOrigin-RevId: 585756812

Imported from GitHub PR openxla#7269 The size of a uint8_t is 1 so a static_assert to check that it is 0 makes no sense, fix it. Also fix a couple of warnings about lack of typename Copybara import of the project: -- e72164f by Andrew Goodbody <[email protected]>: Fix an incorrect static_assert The size of a uint8_t is 1 so a static_assert to check that it is 0 will always be false. The original intent was to have the assert only trigger if the struct was instantiated but the standard deems it ill formed if it can never be true and allows compilers to reject it. Adopt a different workaround that avoids this by allowing the possibility of an evaluation to true. Also fix a couple of warnings about lack of typename Merging this change closes openxla#7269 COPYBARA_INTEGRATE_REVIEW=openxla#7269 from elfringham:fix_assert e72164f PiperOrigin-RevId: 585767871

@xla-rotation

Imported from GitHub PR openxla#7277 This is a follow-up PR for rocm-6.0 platform support. I have also adapted two unit tests which were failing on specific architectures / platform version. @xla-rotation: would you take a look, please? Copybara import of the project: -- 12e6090 by Pavel Emeliyanenko <[email protected]>: another fixes for rocm-6.0 platform -- 9c82a7a by Pavel Emeliyanenko <[email protected]>: added checks for the failing tests -- 853e3a5 by Pavel Emeliyanenko <[email protected]>: fixing cuda compile -- 2d29cc8 by Pavel Emeliyanenko <[email protected]>: addressing reviewer comments Merging this change closes openxla#7277 COPYBARA_INTEGRATE_REVIEW=openxla#7277 from ROCmSoftwarePlatform:ci_rocm_6.0_fixes_followup 2d29cc8 PiperOrigin-RevId: 585769317

PiperOrigin-RevId: 585793365

PiperOrigin-RevId: 585800483

PiperOrigin-RevId: 585818812

PiperOrigin-RevId: 585823405

PiperOrigin-RevId: 585825723

…xla#6973 PiperOrigin-RevId: 585826616

PiperOrigin-RevId: 585832869

PiperOrigin-RevId: 586438695

PiperOrigin-RevId: 586447290

PiperOrigin-RevId: 586449725

PiperOrigin-RevId: 586452165

… serial-resource conflicts. This CL adds the new rule kLessSerialResourceConflict, which encourages picking the instruction whose serial resource conflict is smaller than the other alternative. The conflict is computed as the sum of the number of conflicting resources in flight. The new rule is placed after kLessStall, which means if a conflicting instruction creates less stall than a non-conflicting instruction, it will still be picked. The new rule is useful when there are enough non-resource-conflicting zero-stall alternatives that we can overlap the async collectives with. PiperOrigin-RevId: 586452690

name old cpu/op new cpu/op delta BM_BufferArgX1 6.76ns ± 3% 6.72ns ± 1% ~ (p=0.518 n=19+16) BM_BufferArgX4 13.0ns ± 9% 12.9ns ± 8% ~ (p=0.092 n=18+20) BM_TupleOfI32Attrs 45.1ns ± 1% 43.3ns ± 1% -3.90% (p=0.000 n=16+17) name old time/op new time/op delta BM_BufferArgX1 6.76ns ± 3% 6.72ns ± 1% ~ (p=0.529 n=19+16) BM_BufferArgX4 13.0ns ± 9% 12.9ns ± 8% ~ (p=0.093 n=18+20) BM_TupleOfI32Attrs 45.1ns ± 1% 43.3ns ± 1% -3.90% (p=0.000 n=16+17) PiperOrigin-RevId: 586467144

PiperOrigin-RevId: 586471723

PiperOrigin-RevId: 586471778

PiperOrigin-RevId: 586474576

…Executor to XLA:GPU level This is a high level XLA implementation detail that should not leak deep into StreamExecutor PiperOrigin-RevId: 586476378

PiperOrigin-RevId: 586478058

PiperOrigin-RevId: 586480698

PiperOrigin-RevId: 586482573

PiperOrigin-RevId: 586483855

…opies. PiperOrigin-RevId: 586488071

PiperOrigin-RevId: 586489430

PiperOrigin-RevId: 586491456

Updates LLVM usage to match [f688e0901213](llvm/llvm-project@f688e0901213) PiperOrigin-RevId: 586494799

PiperOrigin-RevId: 586498567

…7360 PiperOrigin-RevId: 586502257

http://github.com/tensorflow/runtime/commit/8f915f25e8b17d2509bb6c7f199a45f2a5e6736c. PiperOrigin-RevId: 586502386

…ivePipeliner. Fix an apparently missing callback if --xla_gpu_enable_pipelined_p2p is enabled. PiperOrigin-RevId: 586507418

PiperOrigin-RevId: 586510253

PiperOrigin-RevId: 586512876

PiperOrigin-RevId: 586517769

PiperOrigin-RevId: 586519546

Currently we look for ptxas and nvlink in a few different places on the host machine, then we choose the first found binary without taking its version into account. If the chosen binary doesn't fulfill our version requirements we will later fail even if there was a suitable ptxas or nvlink in the search path in the first place. This change makes it take the version of each binary into account when going through the search path. Unsuitable binaries will be discarded right away and the search continues until we are out of locations to check. This should help with host environments that have multiple CUDA toolkits installed and should make ptxas and nvlink selection more robust. The concreate changes: 1. `FindCudaExecutable` now also takes a minimum version and a list of forbidden (think buggy) versions that are supposed to be skipped. 2. `WarnIfBadPtxAsVersion` has been removed. It was checking for ptxas < 11.1 which is way older than our minimum supported version of 11.8 and was not doing anything given the check described in #3. 3. There was another version check for `ptxas` in `NVPTXCompiler::ChooseLinkingMethod` which was checking for `version(ptxas)` < 11.8. This has also been removed/replace by the version check described in #4. 4. Version checking for `ptxas` and `nvlink` has been consolidated into 2 methods `FindPtxAsExectuable` and `FindNvLinkExecutable`. These methods hard code the current minimum version (and the list of excluded versions) of each tool in one place. It's still not great but at least less spaghetti-like. PiperOrigin-RevId: 618797392

chsigg and others added 30 commits November 26, 2023 00:07

[XLA:GPU] NFC: use matchers in PriorityFusionTest.

aa20614

PiperOrigin-RevId: 585379423

[XLA:GPU] NFC: make IsReadCoalesced() slightly easier to read.

048e5e4

PiperOrigin-RevId: 585386076

Make cudnn_fused_conv_rewriter_test work in OSS.

0154c7b

PiperOrigin-RevId: 585568014

Enable tests that require v100 in OSS.

3249850

PiperOrigin-RevId: 585594284

[XLA:GPU] Add can_fuse cache.

12adbd1

Avoid unnecessary calls to can_fuse_ to save a significant amount of compile time. PiperOrigin-RevId: 585598642

Integrate LLVM at llvm/llvm-project@5e5a22caf88a

3ccca51

Updates LLVM usage to match [5e5a22caf88a](llvm/llvm-project@5e5a22caf88a) PiperOrigin-RevId: 585609371

[XLA:GPU] (NFC) Remove producer_user_count_.

59b1d67

It's not longer used. The heuristic that used the user count was removed in cl/573215172. PiperOrigin-RevId: 585613833

[XLA:GPU][NFC] Clean up tile analysis tests to use `TF_ASSERT_OK_AND_…

bb07b21

…ASSIGN`. Previously, the tests assigned the wrapped map to a variable before calling `ASSERT_IS_OK` on the resulting variable, which deviates from the recommended pattern. PiperOrigin-RevId: 585618767

Update TFRT dependency to use revision

ab750b1

http://github.com/tensorflow/runtime/commit/58f2ec4dc891dc0bc0815a8c2d1caf196bfc13d5. PiperOrigin-RevId: 585619228

[XLA] Collective pipeliner goes through small reduces as they do not …

7506f69

…increase memory pressure much. PiperOrigin-RevId: 585669608

Adds back the solver parameter string.

5320016

PiperOrigin-RevId: 585712342

[stream_executor] Add Case conditional command to CommandBuffer openx…

c9e2046

…la#6973 PiperOrigin-RevId: 585717031

[stream_executor] Add For conditional command to CommandBuffer

b71268c

PiperOrigin-RevId: 585733291

[stream_executor] NFC: Do not leak internal stream executor header

dee190a

PiperOrigin-RevId: 585744507

[XLA:GPU] Triton GEMM: enable fusion of inputs concatenated along non…

2c5ce09

…-contracting dimensions. PiperOrigin-RevId: 585748647

Deduplicate some code by using a refactored portion of MaybeFollowIns…

fea3aa5

…StrategyGroup to generate strategies for elementwise ops as well. PiperOrigin-RevId: 585749608

Search for possible predecessors instead of only caching one hops.

48d695d

Some hlo computations were getting scheduling loops because of the alias checking. PiperOrigin-RevId: 585756812

Add c-api support for GetCompiledMemoryStats.

7c23e8c

PiperOrigin-RevId: 585793365

Update visibility for tf.data.

ecae151

PiperOrigin-RevId: 585800483

[xla:gpu] NFC: move thunk->CMD conversion to runtime3 openxla#6528

889fa5c

PiperOrigin-RevId: 585818812

[stream_executor] NFC: Do not leak internal stream executor header

6d09f41

PiperOrigin-RevId: 585823405

[stream_executor] Do not use KernelArgs directly in GpuExecutor::Launch

6643428

PiperOrigin-RevId: 585825723

[stream_executor] Add While conditional command to CommandBuffer open…

e795dc3

…xla#6973 PiperOrigin-RevId: 585826616

[stream_executor] NFC: Do not leak internal stream executor header

537ef5e

PiperOrigin-RevId: 585832869

hawkinsp and others added 27 commits November 29, 2023 13:47

Reverts 69f26cf

f69ec88

PiperOrigin-RevId: 586438695

Remove obselete TODO

3400945

PiperOrigin-RevId: 586447290

Reverts c4f0a9d

baf6958

PiperOrigin-RevId: 586449725

Remove tensorflow namespace from tsl/platform/status_matchers.h

a680ac3

PiperOrigin-RevId: 586452165

[XLA] Allow moving for HloSharding in xla::HloInstruction::set_sharding.

4bf4b1e

PiperOrigin-RevId: 586471723

Add EvalOrPattern to StablehloRefineShapes pass

6449702

PiperOrigin-RevId: 586471778

[XLA] Allow tuple_shardings move in the HloSharding ctor.

aab9273

PiperOrigin-RevId: 586474576

[xla:gpu] Move external allocation implementation details from Stream…

135b0ea

…Executor to XLA:GPU level This is a high level XLA implementation detail that should not leak deep into StreamExecutor PiperOrigin-RevId: 586476378

[XLA] Do not create temporary vector in HloSharding::GetSubSharding.

3521147

PiperOrigin-RevId: 586478058

[XLA] Remove an unnecessary HloSharding assignment.

d6a1cf3

PiperOrigin-RevId: 586480698

Use clang+NVCC compilers for all XLA GPU jobs.

4695163

PiperOrigin-RevId: 586482573

[XLA] Move HloSharding where possible.

3f03d1c

PiperOrigin-RevId: 586483855

[XLA] Change the prototype of ReturnImprovedShardingImpl to minmize c…

c5d313d

…opies. PiperOrigin-RevId: 586488071

[XLA] Optimize HloValue::ComputeUses().

ecfe268

PiperOrigin-RevId: 586489430

[xla:gpu] Own command buffer allocations at a Thunk level

da00f2b

PiperOrigin-RevId: 586491456

Integrate LLVM at llvm/llvm-project@f688e0901213

e86ce2e

Updates LLVM usage to match [f688e0901213](llvm/llvm-project@f688e0901213) PiperOrigin-RevId: 586494799

[stream_executor] Use new CUDA runtime API for TopK

c6f0a6e

PiperOrigin-RevId: 586498567

[xla:gpu] Add the AOT compilation pipeline for thunk runtime openxla#…

a36626c

…7360 PiperOrigin-RevId: 586502257

Update TFRT dependency to use revision

46d4355

http://github.com/tensorflow/runtime/commit/8f915f25e8b17d2509bb6c7f199a45f2a5e6736c. PiperOrigin-RevId: 586502386

[XLA:GPU] Improve errors if callbacks are not provided to the Collect…

6eed44a

…ivePipeliner. Fix an apparently missing callback if --xla_gpu_enable_pipelined_p2p is enabled. PiperOrigin-RevId: 586507418

Relocates the evaluation output into the core solver.

dbe129e

PiperOrigin-RevId: 586510253

Lower tensor.from_elements and shape.broadcast ops in ShapeLegalizeToHLO

59dea45

PiperOrigin-RevId: 586512876

Internal infrastructure change

91b8ddb

PiperOrigin-RevId: 586517769

[stream_executor][NFC] More doc for While and For command

58e6b42

PiperOrigin-RevId: 586519546

Merge commit '58e6b428e22e40c4100a7b66790fbe86dc9d7845' into HEAD

9e419d5

wbmc merged commit 6905291 into main Dec 13, 2023

github-actions bot added the kokoro:force-run label Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PIN to 2023_11_29 #4

PIN to 2023_11_29 #4

Uh oh!

wbmc commented Dec 13, 2023

Uh oh!

Uh oh!

PIN to 2023_11_29 #4

PIN to 2023_11_29 #4

Uh oh!

Conversation

wbmc commented Dec 13, 2023

Uh oh!

Uh oh!