Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from tensorflow:master #338

Open
wants to merge 754 commits into
base: master
Choose a base branch
from

Conversation

pull[bot]
Copy link

@pull pull bot commented Jan 9, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.1)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Jan 9, 2025
loislo and others added 29 commits January 21, 2025 01:38
…s BF16 arguments and algorithm=BF16_BF16_F32

We have a wide range of algorithms for dot and the majority of them require F32 arguments. But the BF16_BF16_F32 one actually could accept BF16 arguments because it does not affect the final precision of the dot.

Lets relax the check.

PiperOrigin-RevId: 717795772
For ProducerConsumer fusions, we can have the case that a multi-output fusion
is created even if the producer fusion was not a multi-output fusion. This
happens if the fusion root of the producer is also used outside of the created
ProducerConsumer fusion, and we don't want to duplicate the producer.
This change adds the possibility to create a HloFusionAdaptor for a
ProducerConsumer fusion with extra outputs created for producer roots that are
used outside the ProducerConsumer fusion.

PiperOrigin-RevId: 717796118
Copies are considered unary elementwise ops. We need to make sure we don't try
to move a copy over a copy, otherwise the MoveCopyToUsers pass will not
converge to a fixed point.

PiperOrigin-RevId: 717824897
Follow up to '[XLA:GPU] Add RewritePatterns for binary elementwise ops in SimplifyAffinePass.'.

Fixed pass not finding the parent module (as it was a module op to begin with) and defaulting to i64.

PiperOrigin-RevId: 717825878
Updates LLVM usage to match
[e2402615a5a7](llvm/llvm-project@e2402615a5a7)

PiperOrigin-RevId: 717835256
PiperOrigin-RevId: 717836550
We don't yet support scalable NCCL communicator initialization and should not be generating more than 1 unique id

PiperOrigin-RevId: 717840923
Added a benchmark for popping a task from a queue

-----------------------------------------------------
Benchmark           Time             CPU   Iterations
-----------------------------------------------------
BM_PopTask       2.63 ns         2.63 ns    265317289

PiperOrigin-RevId: 717846665
PiperOrigin-RevId: 717856803
We always use intra-op device to run XLA:CPU kernels, stop pretending that we might have some other option (until we have a real alternative).

PiperOrigin-RevId: 717858535
A recent change to pass order changed the HLO and check started to fail.

Remove the check completely as we don't really care which dimensions are used but whether non-major batch works at all.

PiperOrigin-RevId: 717871376
This paves the way for sharing these passes with CPU.

Note that lower_tensors requires more work before CPU tests can pass.

PiperOrigin-RevId: 717896623
Still missing direct atomics (vs. cmpxchg), but we can add those later.

Note that the added test for non-gep loads would be an
error in non-CPU, hence the separate test file for CPU.

PiperOrigin-RevId: 717914337
The `cilent_polling_for_error_` variable previously claimed the following:

> Once set to true, the value will never change back to false, so no mutex is
> needed.

This is not true. Concurrent access to the boolean is a data race and must be
avoided with a mutex. Thankfully, the code was already protecting access to the
variable with a mutex. This CL adds some thread annotations to double check
this.

PiperOrigin-RevId: 717957732
Embedding pipelining requires a least two steps. However, because the inclusion of summary ops is expensive and only needed periodically, most users run a single training step when enabling summaries. This change detects when summaries are active and automatically disables pipelining (under the assumption that the user will only be running a single step).

PiperOrigin-RevId: 717961226
There are much fewer of these than expected!

PiperOrigin-RevId: 717978053
… custom suffix (e.g. 2.19.0-rc1).

The old implementation didn't delete `-` symbol and expected the wheel filename to be `tensorflow-2.19.0.rc1-cp310-cp310-linux_x86_64.whl` instead of `tensorflow-2.19.0rc1-cp310-cp310-linux_x86_64.whl`.

PiperOrigin-RevId: 718020442
…rflow:tf_quantization_passes from //third_party/tensorflow/compiler/mlir:passes

PiperOrigin-RevId: 718028638
PiperOrigin-RevId: 718030758
qukhan and others added 30 commits January 24, 2025 11:57
…ocate.

All callers are migrated to StreamExecutor::CreateMemoryAllocator(MemoryType::kUnified).

PiperOrigin-RevId: 719391711
This is now no longer used by JAX, and never matter on xla_extension.so because we no longer build it in a CUDA-specific configuration, instead using CUDA plugins.

PiperOrigin-RevId: 719399140
- Add a C++ wrapper type for easier management.
- Make the compile options mandatory in `CompiledModel::Create`. This aligns
  the default value for hardware accelerator selection (depending on how the
  options were specified, you would either get `None` or `Cpu`).

PiperOrigin-RevId: 719402464
PiperOrigin-RevId: 719410517
Reported in:
jax-ml/jax#26062

PiperOrigin-RevId: 719416696
This was considered legacy under the XLA Runtime transition, but XLA
Runtime is no more, so this isn't legacy any more.

While at it, remove a stale declaration of an XlaRuntime method
whose definition was removed long ago.

PiperOrigin-RevId: 719451829
Updating:
 - `env.h`
 - `env_time.h`
 - `errors.h`
 - `file_statistics.h`
 - `file_system.h`
 - `file_system_helper.h`
 - `logging.h`
 - `macros.h`
 - `status.h`
 - `status_matchers.h`
 - `status_to_from_proto.h`
 - `statusor.h`
 - `test.h`
 - `test_benchmark.h`
 - `threadpool.h`
 - `threadpool_async_executor.h`
 - `threadpool_interface.h`
 - `threadpool_options.h`
 - `types.h`

and associated targets.

PiperOrigin-RevId: 719457663
PiperOrigin-RevId: 719477321
… within the internal model api.

PiperOrigin-RevId: 719535624
PiperOrigin-RevId: 719553967
PiperOrigin-RevId: 719560240
PiperOrigin-RevId: 719567198
PiperOrigin-RevId: 719569599
PiperOrigin-RevId: 719570890
PiperOrigin-RevId: 719581903
PiperOrigin-RevId: 719608871
PiperOrigin-RevId: 719659554
PiperOrigin-RevId: 719659988
PiperOrigin-RevId: 719660012
PiperOrigin-RevId: 719676659
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.