Skip to content

Sync with master #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6,827 commits into
base: trt_xp_factory_fix
Choose a base branch
from
Open

Conversation

stevenlix
Copy link
Owner

No description provided.

WilBrady and others added 30 commits June 23, 2022 15:55
* Eager mode ArgMax support.

* Fix basic max and min functionality with minor generator update. Note this does not address all max and min api scope.

* Add addmm test.
* fix mpi build for gcc8 or higher

* fix memory profile for partial graph run

* Revert "fix mpi build for gcc8 or higher"

This reverts commit fb60beb.

* remove debug code

* fix build

* fix build

* fix cpplint and python black format
* op changes

* review comments

* shape consolidation, test trigger, cleanup

* review comments
* Add nested function call tests

* Add overload for Specialize

* Pass symboltable to onnx shape inference

* Avoid renaming empty names

* Enable sequence_map tests which failed before this change
…11491)

* Using vectorized loads (float2) for fp16 to improve performance

* Fix a few warnings from cpplint

* Fix a few warnings from cpplint

* Use __float2half2_rn and fix some cpplint warnings

* Move some computaions to LaunchFastGeluKernel

* Fix some Lint C++ warning

* Using vectorized loads (float4) for fp16 to improve performance

* Switch   whether to optimize FastGelu with float4 vectorization

* Switch to float4 memory access based on input_length in FastGelu

* Comment how to set the threshold of float2 and float4 vectorized kernels

* Add FastGelu fp16 unit tests for bias_length = 2 and 8

* Make vectorized kernels generic with aligned_vector

* Unify the vectorized kernels with/without bias

* Refactor the code to suppress cpplint warnings

* Solve formatting issues

* Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel

* Move fast_gelu_impl.h to rocm/bert

* Fix some Lint C++ warnings and code alignment
* Register signal ops for op set 17

Note code is mostly being moved, not added. These ops were previously
only registered as Microsoft contrib ops and only built if
`BUILD_MS_EXPERIMENTAL_OPS=1`. They've been added to the ai.onnx
standard op set in version 17.

Main components of this change:

* Move the kernels from the conrib_ops directory to the
  core directory.
* Add function bodies for ms experimental ops. This will allow
  old models that use the contrib ops to continue to function.
  All the function bodies consist of a single op (the
  new standard op), so performance overhead should be minimal.

Minor clean-up also in this change:

* De-duplicate get_scalar_value_from_tensor: put it in a new utils.h.
* Fix some bugs that caused compilation errors with the experimental
  ops. Tested with `build.sh --ms_experimental`
* Fix some spelling errors and lint violations.
* Replace a couple of switch statements with `MLTypeCallDispatcher`.
* Use `InlineVector` instead of `std::vector`.

Unblocks #11640
Fix couple of typos
Improve performance of BiasGelu on OneDNN execution provider

This modifies how BiasGelu is handled by the OneDNN execution provider
by executing the gelu_erf primitive as a postop of the binary_add primitive.

Also fixes extra data copies made when running on GPU.

Signed-off-by: George Nash <[email protected]>
Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4.
- [Release notes](https://github.com/caolan/async/releases)
- [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md)
- [Commits](caolan/async@v2.6.3...v2.6.4)

---
updated-dependencies:
- dependency-name: async
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [js/rn] upgrade dependencies for e2e test

* use JDK11 only for gradle

* expand variable
…l signal op definitions (#12006)

* fix winml tests

* remove legacy test

* switch idft -> dft+inverse attr

* upgrade opset 13->17 for signal ops tests
…ls. (#12008)

Add support for double tensor output in TestPreTrainedModels.
Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).
…ation lacking training_mode attribute (#12010)

FusedBatchNormalization include training_mode attribute
* create op from ep

* read input count from context

* create holder to host nodes

* fix typo

* cast type before comparison

* throw error on API fail

* silence warning from minimal build

* switch to unique_ptr with deleter to host nodes

* fix typo

* fix build err for minimal

* fix build err for minimal

* add UT for conv

* enable test on CUDA

* add comment

* fix typo

* use gsl::span and string view for Node constructor

* Added two APIs - CopyKernelInfo and ReleaseKernelInfo

* pass gsl::span by value

* switch to span<NodeArg* const> to allow for reference to const containers

* fix typo

* fix reduced build err

* fix reduced build err

* refactoring node construction logic

* rename exceptions

* add input and output count as arguments for op creation

* refactor static member

* use ORT_CATCH instead of catch

* cancel try catch

* add static value name map

* format input definition and set err code

* fix comments

* fix typo
* Pad fallback to CPU

* Added queryPad in operatorRegistration.cpp

* Acknowledged PR comments

* Used any_of

* used none_of instead of any_of

Co-authored-by: Sumit Agarwal <[email protected]>
(1) add --run_shape_inference to make shape inference optional
(2) add --vocab_mask to make the input optional
(3) add --overwrite in gpt2 convert_to_onnx to allow overwrite existed raw onnx from PyTorch
(4) save gpt2 model tensors to one external data file by default
(5) group convert_beam_search arguments to multiple groups
(6) make --decoder_onnx optional for gpt2 model
(7) replace print by logger
(8) update shape inference function to support external data.
(9) when saving external data, show warning if onnx version < 1.12
[js/web] fix negative axes for unsqueeze
Bumps [electron](https://github.com/electron/electron) from 13.6.6 to 15.5.5.
- [Release notes](https://github.com/electron/electron/releases)
- [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md)
- [Commits](electron/electron@v13.6.6...v15.5.5)

---
updated-dependencies:
- dependency-name: electron
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm

* check hash during compile in TVMso EP

* add IPP-CRYPTO to external dependencies for TVM EP

* made checkHash method constant

* removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict

* implemented SHA-256 calculation using ipp-crypto library

* fix dependency for ipp-crypto

* add provider options for hash check

* update documentation for added provider options

* add hash check condition

* fix docs

* fix lint

* fix ORT_THROW

Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: KJlaccHoeUM9l <[email protected]>
…to_pad (#11984)

* Add warning about future computation change for Convtranspose with auto_pad

* improve msg

* update TODO to make lint happy

* update more contents for warning and add if

* valid was not infected

* move it into kernel registration

* parse auto_pad myself

* try to use conv_transpose_attrs_.auto_pad directly
With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor
and the mode is linear by:

* Transforming NCHW Resize to NHWC variant
* Using the NHWC Resize kernel without floating-point computation

It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves
Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.
Fix windows cpu build VS2021
askhade and others added 29 commits August 3, 2022 15:15
* first draft

* plus fixes

* plus more links

* Plus updates per review

* plus more clarifications

* plus updates

* plus more nit fixes

* plus some additions
Fix comparison of path characters when checking for ".ort" suffix.

Some clean up of InferenceSession Load functions.
- Reduce duplication between std::string/std::wstring versions.
- Renaming for clarity.
Use InlinedVector in a TP
Store per thread parallel section in std::optional and avoid memory allocation
* Split GemmBase RocBlasGemm

* Add composable kernel GEMM baseline

* Make linter happy

* Address review comment

* Update bert cases with batchsize

* Adjust includes to fix IWYU lint

* Only builds and links used ck kernels to improve building time

* Remove warmup run on SelectImpl

* Add comment to utility function

* Mute cpplint

* Make RocBlasGemm<T>::SelectImpl semantically correct

* Add reduced basic test cases for ck gemm

* More robust gemm testing

* Fix warnings

* Fix grammar
* Rework some aspects of Graph::Resolve to reduce memory usage.
Fix Python Packaging CI

Co-authored-by: Ethan Tao <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* update ortmodule opset to 15

* update torch version

* fix ut

* fix ut

* rollback

* rollback for orttrainer
Fix onnxruntime_training.cmake missing linkage issue
…auto keyword (#12483)

* Workaround false positive error produced by clang

ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name"
where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error
here.
* set zero point to 0 if all value are 0.0

* fix bug: lower version of numpy.finfo doesn't have smallest_subnormal

* check scale to make sure it is not subnormal
* adding conditional variable again

* Adding split test cases in python

* Adding python cases for split

* Enable s8s8 split

* Optimize input

* Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"

This reverts commit d5e34ac

* Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)""

This reverts commit 3c1a330.

* format file

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Update c-api-linux-cpu.yml

* Reformat file

* Reformat file

* format file

* Optimize input

* Remove unused import

* Remove useless init

* Format split.py with black
 Working on JNI refactor for OnnxTensor.
  Simplifying the error handling logic in createTensor.
  Collapsing casting branches and migrating to ONNX element type enum.
  Disable cpplint for JNI C files.
…12485)

* Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage.

Co-authored-by: Pranav Sharma <[email protected]>
…es. (#12490)

* improve the compilation speed when compiling for multiple architectures.

* formatting

* fix

* use 0 by default

* fix comments
* mod for cuda and rocm

* fix bfloat16 ut

* change bf16 ut number

* fix opset version

* fix op kernel doc
* sce refactor

* refactor

* remove usnecessory memset
* Add Codeowners for dependency files

* Fix team @s
* Load checkpoint in cpp

* removed unused imports

* throw error on invalid name and change function name

* inplace model assignment, change name and other comments resolved

* name change  on import

* Addded unit test, resolved comments

* remove unused  imports

* resolved comments

* refactoring too reduce memoory allocation

* resolved extra comments

* changed files hierarchy an force added onnx moodel

* solved order of function argument

* used gtest macros on test cases

Co-authored-by: Adam Louly <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
…el (#12474)

Python module for dumping activation tensors when running an ONNX model

This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.
* support concatenation via aten::cat.out

* wrap dims

* rename vars in tests, test wrapped dims
@faxu faxu deleted the master branch August 10, 2022 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.