forked from microsoft/onnxruntime
-
Notifications
You must be signed in to change notification settings - Fork 0
Sync with master #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
stevenlix
wants to merge
6,827
commits into
stevenlix:trt_xp_factory_fix
Choose a base branch
from
microsoft:master
base: trt_xp_factory_fix
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Eager mode ArgMax support. * Fix basic max and min functionality with minor generator update. Note this does not address all max and min api scope. * Add addmm test.
* fix mpi build for gcc8 or higher * fix memory profile for partial graph run * Revert "fix mpi build for gcc8 or higher" This reverts commit fb60beb. * remove debug code * fix build * fix build * fix cpplint and python black format
* op changes * review comments * shape consolidation, test trigger, cleanup * review comments
* Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change
Provider better documentation
…11491) * Using vectorized loads (float2) for fp16 to improve performance * Fix a few warnings from cpplint * Fix a few warnings from cpplint * Use __float2half2_rn and fix some cpplint warnings * Move some computaions to LaunchFastGeluKernel * Fix some Lint C++ warning * Using vectorized loads (float4) for fp16 to improve performance * Switch whether to optimize FastGelu with float4 vectorization * Switch to float4 memory access based on input_length in FastGelu * Comment how to set the threshold of float2 and float4 vectorized kernels * Add FastGelu fp16 unit tests for bias_length = 2 and 8 * Make vectorized kernels generic with aligned_vector * Unify the vectorized kernels with/without bias * Refactor the code to suppress cpplint warnings * Solve formatting issues * Remove cudaDeviceProp from FastGeluKernel and LaunchFastGeluKernel * Move fast_gelu_impl.h to rocm/bert * Fix some Lint C++ warnings and code alignment
* Register signal ops for op set 17 Note code is mostly being moved, not added. These ops were previously only registered as Microsoft contrib ops and only built if `BUILD_MS_EXPERIMENTAL_OPS=1`. They've been added to the ai.onnx standard op set in version 17. Main components of this change: * Move the kernels from the conrib_ops directory to the core directory. * Add function bodies for ms experimental ops. This will allow old models that use the contrib ops to continue to function. All the function bodies consist of a single op (the new standard op), so performance overhead should be minimal. Minor clean-up also in this change: * De-duplicate get_scalar_value_from_tensor: put it in a new utils.h. * Fix some bugs that caused compilation errors with the experimental ops. Tested with `build.sh --ms_experimental` * Fix some spelling errors and lint violations. * Replace a couple of switch statements with `MLTypeCallDispatcher`. * Use `InlineVector` instead of `std::vector`. Unblocks #11640
Fix couple of typos
Improve performance of BiasGelu on OneDNN execution provider This modifies how BiasGelu is handled by the OneDNN execution provider by executing the gelu_erf primitive as a postop of the binary_add primitive. Also fixes extra data copies made when running on GPU. Signed-off-by: George Nash <[email protected]>
Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4. - [Release notes](https://github.com/caolan/async/releases) - [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md) - [Commits](caolan/async@v2.6.3...v2.6.4) --- updated-dependencies: - dependency-name: async dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [js/rn] upgrade dependencies for e2e test * use JDK11 only for gradle * expand variable
…l signal op definitions (#12006) * fix winml tests * remove legacy test * switch idft -> dft+inverse attr * upgrade opset 13->17 for signal ops tests
…ls. (#12008) Add support for double tensor output in TestPreTrainedModels.
Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).
…ation lacking training_mode attribute (#12010) FusedBatchNormalization include training_mode attribute
* create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo
* Pad fallback to CPU * Added queryPad in operatorRegistration.cpp * Acknowledged PR comments * Used any_of * used none_of instead of any_of Co-authored-by: Sumit Agarwal <[email protected]>
(1) add --run_shape_inference to make shape inference optional (2) add --vocab_mask to make the input optional (3) add --overwrite in gpt2 convert_to_onnx to allow overwrite existed raw onnx from PyTorch (4) save gpt2 model tensors to one external data file by default (5) group convert_beam_search arguments to multiple groups (6) make --decoder_onnx optional for gpt2 model (7) replace print by logger (8) update shape inference function to support external data. (9) when saving external data, show warning if onnx version < 1.12
[js/web] fix negative axes for unsqueeze
Bumps [electron](https://github.com/electron/electron) from 13.6.6 to 15.5.5. - [Release notes](https://github.com/electron/electron/releases) - [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md) - [Commits](electron/electron@v13.6.6...v15.5.5) --- updated-dependencies: - dependency-name: electron dependency-type: direct:development ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm * check hash during compile in TVMso EP * add IPP-CRYPTO to external dependencies for TVM EP * made checkHash method constant * removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict * implemented SHA-256 calculation using ipp-crypto library * fix dependency for ipp-crypto * add provider options for hash check * update documentation for added provider options * add hash check condition * fix docs * fix lint * fix ORT_THROW Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: KJlaccHoeUM9l <[email protected]>
…to_pad (#11984) * Add warning about future computation change for Convtranspose with auto_pad * improve msg * update TODO to make lint happy * update more contents for warning and add if * valid was not infected * move it into kernel registration * parse auto_pad myself * try to use conv_transpose_attrs_.auto_pad directly
With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor and the mode is linear by: * Transforming NCHW Resize to NHWC variant * Using the NHWC Resize kernel without floating-point computation It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.
Fix windows cpu build VS2021
* roialign opset16 * fix * fix
* first draft * plus fixes * plus more links * Plus updates per review * plus more clarifications * plus updates * plus more nit fixes * plus some additions
Fix comparison of path characters when checking for ".ort" suffix. Some clean up of InferenceSession Load functions. - Reduce duplication between std::string/std::wstring versions. - Renaming for clarity.
…ut (#12408) * share quant param between tensors
Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation
* Split GemmBase RocBlasGemm * Add composable kernel GEMM baseline * Make linter happy * Address review comment * Update bert cases with batchsize * Adjust includes to fix IWYU lint * Only builds and links used ck kernels to improve building time * Remove warmup run on SelectImpl * Add comment to utility function * Mute cpplint * Make RocBlasGemm<T>::SelectImpl semantically correct * Add reduced basic test cases for ck gemm * More robust gemm testing * Fix warnings * Fix grammar
* Rework some aspects of Graph::Resolve to reduce memory usage.
Fix Python Packaging CI Co-authored-by: Ethan Tao <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
[delete] delete rocm4.3.1
* update ortmodule opset to 15 * update torch version * fix ut * fix ut * rollback * rollback for orttrainer
Fix onnxruntime_training.cmake missing linkage issue
…auto keyword (#12483) * Workaround false positive error produced by clang ROCm's hip clang complaints that "use 'template' keyword to treat 'Foo' as a dependent template name" where Foo is not a dependent template name. Instead, avoid the using of auto keyword fixes the error here.
* set zero point to 0 if all value are 0.0 * fix bug: lower version of numpy.finfo doesn't have smallest_subnormal * check scale to make sure it is not subnormal
Fix various warning
* adding conditional variable again * Adding split test cases in python * Adding python cases for split * Enable s8s8 split * Optimize input * Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)" This reverts commit d5e34ac * Revert "Revert "Remove git and python packages from the docker images used by Zip-Nuget-Java-Nodejs Packaging Pipeline (#11651)"" This reverts commit 3c1a330. * format file * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Update c-api-linux-cpu.yml * Reformat file * Reformat file * format file * Optimize input * Remove unused import * Remove useless init * Format split.py with black
Working on JNI refactor for OnnxTensor. Simplifying the error handling logic in createTensor. Collapsing casting branches and migrating to ONNX element type enum. Disable cpplint for JNI C files.
…12485) * Free initializer TensorProto instances as they're converted to OrtValue to reduce peak memory usage. Co-authored-by: Pranav Sharma <[email protected]>
…es. (#12490) * improve the compilation speed when compiling for multiple architectures. * formatting * fix * use 0 by default * fix comments
* mod for cuda and rocm * fix bfloat16 ut * change bf16 ut number * fix opset version * fix op kernel doc
* sce refactor * refactor * remove usnecessory memset
* Add Codeowners for dependency files * Fix team @s
* Load checkpoint in cpp * removed unused imports * throw error on invalid name and change function name * inplace model assignment, change name and other comments resolved * name change on import * Addded unit test, resolved comments * remove unused imports * resolved comments * refactoring too reduce memoory allocation * resolved extra comments * changed files hierarchy an force added onnx moodel * solved order of function argument * used gtest macros on test cases Co-authored-by: Adam Louly <[email protected]@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
…el (#12474) Python module for dumping activation tensors when running an ONNX model This is the first step towards a quantization debugging tool. We dump the activation tensors. Next step would be to compare them: original model vs quantized model (running with same input) to see where the difference becomes significant.
* support concatenation via aten::cat.out * wrap dims * rename vars in tests, test wrapped dims
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.