Sync with master #2

stevenlix · 2019-02-06T05:38:09Z

No description provided.

* Eager mode ArgMax support. * Fix basic max and min functionality with minor generator update. Note this does not address all max and min api scope. * Add addmm test.

* fix mpi build for gcc8 or higher * fix memory profile for partial graph run * Revert "fix mpi build for gcc8 or higher" This reverts commit fb60beb. * remove debug code * fix build * fix build * fix cpplint and python black format

* op changes * review comments * shape consolidation, test trigger, cleanup * review comments

* Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change

Provider better documentation

…11491) * Using vectorized loads (float2) for fp16 to improve performance * Fix a few warnings from cpplint * Fix a few warnings from cpplint * Use __float2half2_rn and fix some cpplint warnings * Move some computaions to LaunchFastGeluKernel * Fix some Lint C++ warning * Using vectorized loads (float4) for fp16 to improve performance * Switch whether to optimize FastGelu with float4 vectorization * Switch to float4 memory access based on input_length in FastGelu * Comment how to set the threshold of float2 and float4 vectorized kernels * Add FastGelu fp16 unit tests for bias_length = 2 and 8 * Make vectorized kernels generic with aligned_vector * Unify the vectorized kernels with/without bias * Refactor the code to suppress cpplint warnings * Solve formatting issues * Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel * Move fast_gelu_impl.h to rocm/bert * Fix some Lint C++ warnings and code alignment

* Register signal ops for op set 17 Note code is mostly being moved, not added. These ops were previously only registered as Microsoft contrib ops and only built if `BUILD_MS_EXPERIMENTAL_OPS=1`. They've been added to the ai.onnx standard op set in version 17. Main components of this change: * Move the kernels from the conrib_ops directory to the core directory. * Add function bodies for ms experimental ops. This will allow old models that use the contrib ops to continue to function. All the function bodies consist of a single op (the new standard op), so performance overhead should be minimal. Minor clean-up also in this change: * De-duplicate get_scalar_value_from_tensor: put it in a new utils.h. * Fix some bugs that caused compilation errors with the experimental ops. Tested with `build.sh --ms_experimental` * Fix some spelling errors and lint violations. * Replace a couple of switch statements with `MLTypeCallDispatcher`. * Use `InlineVector` instead of `std::vector`. Unblocks #11640

Fix couple of typos

Improve performance of BiasGelu on OneDNN execution provider This modifies how BiasGelu is handled by the OneDNN execution provider by executing the gelu_erf primitive as a postop of the binary_add primitive. Also fixes extra data copies made when running on GPU. Signed-off-by: George Nash <[email protected]>

Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4. - [Release notes](https://github.com/caolan/async/releases) - [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md) - [Commits](caolan/async@v2.6.3...v2.6.4) --- updated-dependencies: - dependency-name: async dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [js/rn] upgrade dependencies for e2e test * use JDK11 only for gradle * expand variable

…l signal op definitions (#12006) * fix winml tests * remove legacy test * switch idft -> dft+inverse attr * upgrade opset 13->17 for signal ops tests

…ls. (#12008) Add support for double tensor output in TestPreTrainedModels.

Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).

…ation lacking training_mode attribute (#12010) FusedBatchNormalization include training_mode attribute

* create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo

* Pad fallback to CPU * Added queryPad in operatorRegistration.cpp * Acknowledged PR comments * Used any_of * used none_of instead of any_of Co-authored-by: Sumit Agarwal <[email protected]>

(1) add --run_shape_inference to make shape inference optional (2) add --vocab_mask to make the input optional (3) add --overwrite in gpt2 convert_to_onnx to allow overwrite existed raw onnx from PyTorch (4) save gpt2 model tensors to one external data file by default (5) group convert_beam_search arguments to multiple groups (6) make --decoder_onnx optional for gpt2 model (7) replace print by logger (8) update shape inference function to support external data. (9) when saving external data, show warning if onnx version < 1.12

[js/web] fix negative axes for unsqueeze

Bumps [electron](https://github.com/electron/electron) from 13.6.6 to 15.5.5. - [Release notes](https://github.com/electron/electron/releases) - [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md) - [Commits](electron/electron@v13.6.6...v15.5.5) --- updated-dependencies: - dependency-name: electron dependency-type: direct:development ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm * check hash during compile in TVMso EP * add IPP-CRYPTO to external dependencies for TVM EP * made checkHash method constant * removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict * implemented SHA-256 calculation using ipp-crypto library * fix dependency for ipp-crypto * add provider options for hash check * update documentation for added provider options * add hash check condition * fix docs * fix lint * fix ORT_THROW Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: KJlaccHoeUM9l <[email protected]>

…to_pad (#11984) * Add warning about future computation change for Convtranspose with auto_pad * improve msg * update TODO to make lint happy * update more contents for warning and add if * valid was not infected * move it into kernel registration * parse auto_pad myself * try to use conv_transpose_attrs_.auto_pad directly

With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor and the mode is linear by: * Transforming NCHW Resize to NHWC variant * Using the NHWC Resize kernel without floating-point computation It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.

Fix windows cpu build VS2021

* roialign opset16 * fix * fix

* first draft * plus fixes * plus more links * Plus updates per review * plus more clarifications * plus updates * plus more nit fixes * plus some additions

Fix comparison of path characters when checking for ".ort" suffix. Some clean up of InferenceSession Load functions. - Reduce duplication between std::string/std::wstring versions. - Renaming for clarity.

…ut (#12408) * share quant param between tensors

Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation

* Split GemmBase RocBlasGemm * Add composable kernel GEMM baseline * Make linter happy * Address review comment * Update bert cases with batchsize * Adjust includes to fix IWYU lint * Only builds and links used ck kernels to improve building time * Remove warmup run on SelectImpl * Add comment to utility function * Mute cpplint * Make RocBlasGemm<T>::SelectImpl semantically correct * Add reduced basic test cases for ck gemm * More robust gemm testing * Fix warnings * Fix grammar

* Rework some aspects of Graph::Resolve to reduce memory usage.

Fix Python Packaging CI Co-authored-by: Ethan Tao <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

[delete] delete rocm4.3.1

* update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer

Fix onnxruntime_training.cmake missing linkage issue

…auto keyword (#12483) * Workaround false positive error produced by clang ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name" where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error here.

* set zero point to 0 if all value are 0.0 * fix bug: lower version of numpy.finfo doesn't have smallest_subnormal * check scale to make sure it is not subnormal

Fix various warning

* adding conditional variable again * Adding split test cases in python * Adding python cases for split * Enable s8s8 split * Optimize input * Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)" This reverts commit d5e34ac * Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"" This reverts commit 3c1a330. * format file * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Reformat file * Reformat file * format file * Optimize input * Remove unused import * Remove useless init * Format split.py with black

Working on JNI refactor for OnnxTensor. Simplifying the error handling logic in createTensor. Collapsing casting branches and migrating to ONNX element type enum. Disable cpplint for JNI C files.

…12485) * Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage. Co-authored-by: Pranav Sharma <[email protected]>

…es. (#12490) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments

* mod for cuda and rocm * fix bfloat16 ut * change bf16 ut number * fix opset version * fix op kernel doc

* sce refactor * refactor * remove usnecessory memset

@s

* Add Codeowners for dependency files * Fix team @s

* Load checkpoint in cpp * removed unused imports * throw error on invalid name and change function name * inplace model assignment, change name and other comments resolved * name change on import * Addded unit test, resolved comments * remove unused imports * resolved comments * refactoring too reduce memoory allocation * resolved extra comments * changed files hierarchy an force added onnx moodel * solved order of function argument * used gtest macros on test cases Co-authored-by: Adam Louly <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

…el (#12474) Python module for dumping activation tensors when running an ONNX model This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.

* support concatenation via aten::cat.out * wrap dims * rename vars in tests, test wrapped dims

WilBrady and others added 30 commits June 23, 2022 15:55

Eager mode: Argmax and fixup max and min. (#11861)

fa7f80c

* Eager mode ArgMax support. * Fix basic max and min functionality with minor generator update. Note this does not address all max and min api scope. * Add addmm test.

fix memory profile for partial graph run (#11911)

0d6cbc6

* fix mpi build for gcc8 or higher * fix memory profile for partial graph run * Revert "fix mpi build for gcc8 or higher" This reverts commit fb60beb. * remove debug code * fix build * fix build * fix cpplint and python black format

Redesign InPlaceAccumulator op (#11842)

c2fd5cc

* op changes * review comments * shape consolidation, test trigger, cleanup * review comments

Restructure function inliner (#11731)

b1411c8

* Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change

Deprecate APIs returning raw ptrs and provide replacements (#11922)

088bc74

Provider better documentation

Merge branch 'master' into training_dev/on_device_poc

d25cf4d

Fix a couple of typos (#11943)

f72288b

Fix couple of typos

Include opset 15 in Conv+BatchNormalization fusion (#11960)

8cd0250

[js/rn] upgrade dependencies for e2e test (#11863)

bd973bc

* [js/rn] upgrade dependencies for e2e test * use JDK11 only for gradle * expand variable

Fix WinML Tests are still targetting deprecated (deleted) experimenta…

7d712c8

…l signal op definitions (#12006) * fix winml tests * remove legacy test * switch idft -> dft+inverse attr * upgrade opset 13->17 for signal ops tests

[C# Tests] Add support for double tensor output in TestPreTrainedMode…

466b2d9

…ls. (#12008) Add support for double tensor output in TestPreTrainedModels.

[NNAPI EP] Update NNAPI headers (#11954)

f045994

Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).

DML EP ResNet50 opset 15 fails in ONNX checker for FusedBatchNormaliz…

fc0143f

…ation lacking training_mode attribute (#12010) FusedBatchNormalization include training_mode attribute

[DML EP] Pad operator: Handle negative pad counts (#11974)

4552dd3

* Pad fallback to CPU * Added queryPad in operatorRegistration.cpp * Acknowledged PR comments * Used any_of * used none_of instead of any_of Co-authored-by: Sumit Agarwal <[email protected]>

[js/web][bugfix] fix negative axes for unsqueeze (#11944)

0702364

[js/web] fix negative axes for unsqueeze

Separate training apis from shared core apis (#12027)

6e8edff

Fix windows cpu build VS2022 (#12032)

0ce14c7

Fix windows cpu build VS2021

update roialign cuda impl to onnx opset16 (#12036)

102d01b

* roialign opset16 * fix * fix

support optimizer opt for deepspeed 0.5.9

2295b24

resolve comments

100aebb

askhade and others added 29 commits August 3, 2022 15:15

dev notes for layout transformer (#12396)

97268e0

* first draft * plus fixes * plus more links * Plus updates per review * plus more clarifications * plus updates * plus more nit fixes * plus some additions

Refactor InferenceSession Load member functions. (#12430)

3efd9a7

Fix comparison of path characters when checking for ".ort" suffix. Some clean up of InferenceSession Load functions. - Reduce duplication between std::string/std::wstring versions. - Renaming for clarity.

Minor doc fixes (#12388)

52d4699

Enable quant op to share quantization parameter between input and oup…

ac10f33

…ut (#12408) * share quant param between tensors

Remove dynamic allocation for ThreadPool ParallelSection (#12429)

a4ef0e7

Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation

Lironkesem/unsqueeze_and_squeeze (#12421)

d452462

[CUDA] BiasSoftmax Supporting New Pattern (#12361)

37995a7

Rework parts of Graph::Resolve to reduce memory usage (#12176)

8d830ad

* Rework some aspects of Graph::Resolve to reduce memory usage.

Fix Python Packaging CI (Rocm) (#12477)

b879dca

Fix Python Packaging CI Co-authored-by: Ethan Tao <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>

[DELETE] delete python package rocm4.3.1 (#12480)

3e1b0ac

[delete] delete rocm4.3.1

CUDA kernel for ClipGradNorm for TensorSeq gradients (#12412)

a7d6290

Update ORTModule Default Opset Version to 15 (#12419)

e85e31e

* update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer

DML EP fix training build error (#12461)

eb90b52

Fix onnxruntime_training.cmake missing linkage issue

set zero point to 0 if all value are 0.0 (#12470)

bdd6b00

* set zero point to 0 if all value are 0.0 * fix bug: lower version of numpy.finfo doesn't have smallest_subnormal * check scale to make sure it is not subnormal

Fix various warning in kernel explorer (#12501)

9c05577

Fix various warning

[Java] JNI refactor for ONNX Tensor (#12281)

8a86b34

Working on JNI refactor for OnnxTensor. Simplifying the error handling logic in createTensor. Collapsing casting branches and migrating to ONNX element type enum. Disable cpplint for JNI C files.

remove the link the comments (#12510)

730240d

Incrementally free initializers while saving to OrtValue instances (#…

56bd96a

…12485) * Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage. Co-authored-by: Pranav Sharma <[email protected]>

Improve the compilation speed when compiling for multiple architectur…

a2dc3e9

…es. (#12490) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments

[CUDA] Mod Op Kernel (#12499)

cfa09d1

* mod for cuda and rocm * fix bfloat16 ut * change bf16 ut number * fix opset version * fix op kernel doc

[CUDA] SoftmaxCrossEntropy Kernels Refactor (#12482)

2bed0d4

* sce refactor * refactor * remove usnecessory memset

Add codeowners for requirement files (#12512)

ee3b757

* Add Codeowners for dependency files * Fix team @s

Eager Mode - Support Concatenation via aten::cat.out (#12527)

0d9a02e

* support concatenation via aten::cat.out * wrap dims * rename vars in tests, test wrapped dims

Bugfix for BiasSoftmax Fusion (#12517)

0c6037b

faxu deleted the master branch August 10, 2022 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync with master #2

Sync with master #2

Uh oh!

stevenlix commented Feb 6, 2019

Uh oh!

Uh oh!

Sync with master #2

Are you sure you want to change the base?

Sync with master #2

Uh oh!

Conversation

stevenlix commented Feb 6, 2019

Uh oh!

Uh oh!