merge master #13

yuguo-Jack · 2024-01-31T08:12:06Z

merge master

Co-authored-by: oneflow-ci-bot <[email protected]>

fix the error: AttributeError: 'UNet2DConditionModel' object has no attribute 'cpg'

…nal/cuda.cmake to avoid llvm15 link error (#10373)

Co-authored-by: oneflow-ci-bot <[email protected]>

文档： ![image](https://github.com/Oneflow-Inc/oneflow/assets/53533850/8899fdd0-2bec-44c3-9825-54ee19f775f7) --------- Co-authored-by: oneflow-ci-bot <[email protected]>

adaptation of huawei ascend910b chip on oneflow: - Oneflow-Inc/OneTeam#2181 --------- Co-authored-by: Shenghang Tsai <[email protected]>

Co-authored-by: oneflow-ci-bot <[email protected]>

本Pr实现了OneFlow Insight模块，相关issue：Oneflow-Inc/OneTeam#2162 当我们需要profiling cuda kernel执行时间/瓶颈分析时，通常会基于nvidia提供的nsys指令，生成对应的profile文件（早期的.qdrep以及现在的.nsys-rep）并用Nvidia的GUI软件Nsight Systems来可视化分析、查看。在nsys生成profile文件的同时，还会生成平台无关的数据信息，记录在.sqlite文件中，OneFlow Insight模块就可以通过解析.sqlite，来生成符合Google Chrome Trace Event格式的JSON文件，使得可以直接通过Chrome或者Edge浏览器，通过`chrome://tracing/` 或 `edge://tracing/`来解析和渲染此JSON文件，从而进行可视化分析、查看，效果如下： <img width="1320" alt="image" src="https://github.com/Oneflow-Inc/oneflow/assets/28823622/cbfab9bc-47bd-474c-8f39-e145348db17d"> --------- Co-authored-by: oneflow-ci-bot <[email protected]>

Co-authored-by: Houjiang Chen <[email protected]>

python3 -c "import timm" 输出如下： ``` /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:19: UserWarning: The oneflow.jit interface is just to align the torch.jit interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:134: UserWarning: The oneflow.jit.Final interface is just to align the torch.jit.Final interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:31: UserWarning: The oneflow.jit.script interface is just to align the torch.jit.script interface and has no practical significance. warnings.warn( /home/wangyi/miniconda3/envs/py10/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: 'load_library is not implemented'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source? warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:49: UserWarning: The oneflow.jit.unused interface is just to align the torch.jit.unused interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/onnx/symbolic_helper.py:21: UserWarning: The oneflow.onnx.parse_args interface is just to align the torch.onnx.parse_args interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:57: UserWarning: The oneflow.jit._script_if_tracing interface is just to align the torch.jit._script_if_tracing interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/onnx/__init__.py:26: UserWarning: The oneflow.onnx.register_custom_op_symbolic interface is just to align the torch.onnx.register_custom_op_symbolic interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:65: UserWarning: The oneflow.jit._overload_method interface is just to align the torch.jit._overload_method interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:134: UserWarning: The oneflow.jit.Final interface is just to align the torch.jit.Final interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:31: UserWarning: The oneflow.jit.script interface is just to align the torch.jit.script interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:38: UserWarning: The oneflow.jit.ignore interface is just to align the torch.jit.ignore interface and has no practical significance. warnings.warn( /data/home/wangyi/workspace/oneflow-public/python/oneflow/jit/__init__.py:140: UserWarning: The oneflow.jit.interface interface is just to align the torch.jit.interface interface and has no practical significance. warnings.warn( ```

* register mem_get_info * add unittest * refine unittest * Update env.cpp * format * format * format * format * rename GetMemInfo to CudaGetMemoInfo

重构 mock_torch 模块

close #10397 这里只需要在参数检查的时候对 is_grads_batched 做处理就行了，不需要侵入到 AutogradEgnine 里。实际后向计算的时候，会自己做 broadcast 操作，如果计算错误，是算子对 broadcast 支持的不全。 --------- Co-authored-by: wyg1997 <[email protected]>

Oneflow-Inc/oneflow#10395 这个 PR 里多此一举给 jit/\_\_init__.py 加了一个 warning，导致 import oneflow 的时候就会显示这个 warning，这里去掉

fix #10405 Co-authored-by: wyg1997 <[email protected]>

…_compiler.compile_from_torch (#10408) resubmit for PR #10404 --------- Co-authored-by: Yinggang Wang <[email protected]>

Before this commit, error messge is: **inconsistent tensor size, expected all tensor to have the same number of elements, but got 640000 and 742400** After this commit, error message is **inconsistent tensor size, expected all tensor to have the same number of elements, but got (1,256,50,50) and (1,256,50,58)**

…of the host compiler. (#10415)

## 分支工作添加jvp、jacobian、hessian、hvp、vhp接口到funcitional.py文件，添加测试代码到test_autograd_functional.py文件 ## 代码修改部分 1. \_construct_standard_basis_for(): 因fill_ 不支持 NonContiguous，将fill_赋值替换为循环赋值。 (Oneflow-Inc/oneflow#10394) 2. \_construct_standard_basis_for(): 函数参数tensors: Tuple[torch.Tensor, ...]的写法会导致循环导入torch，因此将后面的描述去掉，变为tensors 3. _grad_preprocess:因oneflow.torch没有is_sparse属性，只采用res.append(inp.view_as(inp))方式创建新tensor 4. _jacfwd: 删除vectorize情况的代码 5. 在test_autograd_functional.py文件中添加jvp、jacobian、hessian、hvp、vhp测试代码 ## 添加接口 ![1704862723128](https://github.com/Oneflow-Inc/oneflow/assets/49504565/8ca9cfa6-0f1c-425b-baa1-036335e3d3e1) --------- Co-authored-by: Wang Yi <[email protected]> Co-authored-by: oneflow-ci-bot <[email protected]>

Co-authored-by: oneflow-ci-bot <[email protected]>

可以减少一点点编译时间

…rocm

strint and others added 30 commits September 27, 2023 02:48

Fix pb type hint error (#10336)

84ef72f

Feat graph load to new device (#10335)

dea3f43

Co-authored-by: oneflow-ci-bot <[email protected]>

set default cpg to None (#10343)

553c55e

fix the error: AttributeError: 'UNet2DConditionModel' object has no attribute 'cpg'

Priv release (#10347)

b809137

Add cu12 release (#10348)

a5041b8

Fix typo in release workflow (#10349)

3d4df94

Support building ext in private release (#10350)

46b8232

Fix concurrency in priv release (#10351)

0717248

Add --disable-ignore-error in OSS action (#10360)

bbae1cd

Use community branch in priv-release (#10361)

44b4ff3

Use dedicated workflow for community (#10363)

8f4f5d5

Fix concurrency for priv/community (#10369)

bdfddbc

Community fix cu12 pkg size (#10368)

9e4e295

Remove py3.7 in release (#10371)

bc08398

Use dynamic link in cn/cpu.cmake, international/cpu.cmake, internatio…

5d87afc

…nal/cuda.cmake to avoid llvm15 link error (#10373)

Community build py311 (#10377)

21b51a6

Skip Conv cases failed (#10383)

6f53149

Co-authored-by: oneflow-ci-bot <[email protected]>

autograd.functional.vjp (#10356)

559f5ec

文档： ![image](https://github.com/Oneflow-Inc/oneflow/assets/53533850/8899fdd0-2bec-44c3-9825-54ee19f775f7) --------- Co-authored-by: oneflow-ci-bot <[email protected]>

Support cuda 12.x (#10367)

a6feab0

Support Huawei Ascend910b chip (#10386)

b11f102

adaptation of huawei ascend910b chip on oneflow: - Oneflow-Inc/OneTeam#2181 --------- Co-authored-by: Shenghang Tsai <[email protected]>

Update CMake cache files for new community builds (#10388)

c4c687f

Cache global generator (#10387)

92788cb

Co-authored-by: oneflow-ci-bot <[email protected]>

Fix community uploads (#10389)

a0f2122

fix py38 and support python3.11 (#10391)

18538f6

Co-authored-by: Houjiang Chen <[email protected]>

add cuda.mem_get_info (#82) (#10398)

af0807a

* register mem_get_info * add unittest * refine unittest * Update env.cpp * format * format * format * format * rename GetMemInfo to CudaGetMemoInfo

refine_mock_torch (#10396)

d16fa88

重构 mock_torch 模块

Update default input of branch to main, in the community build (#10400)

6ed4991

jackalcooper and others added 15 commits January 10, 2024 17:16

Backport changes auto-install CUDA packages (#10402)

3bf53a0

remove jit warning in __init__ (#10403)

76b7560

Oneflow-Inc/oneflow#10395 这个 PR 里多此一举给 jit/\_\_init__.py 加了一个 warning，导致 import oneflow 的时候就会显示这个 warning，这里去掉

Autograd.grad support out_grads with list of None (#10406)

02a4366

fix #10405 Co-authored-by: wyg1997 <[email protected]>

Add proxy env in CI (#10407)

66c671f

move onediff.infer_compiler.oneflow_compile to onefow.framework.infer…

2e331e1

…_compiler.compile_from_torch (#10408) resubmit for PR #10404 --------- Co-authored-by: Yinggang Wang <[email protected]>

Change default value for ONEFLOW_MLIR_PREFER_NHWC (#10410)

9afbc9b

Fix bug when compiling faster_rcnn's backbone (#10414)

0320ed0

Decrease default cuda architectures to build on cause the limitation …

06c9ead

…of the host compiler. (#10415)

Fixed nightly version for pro, instead adding commit hash (#10416)

dafc2b1

add functional group norm and dtype device param (#10417)

bf9ae02

Co-authored-by: oneflow-ci-bot <[email protected]>

Dump MLIR only in debug mode (#10422)

8f055f3

可以减少一点点编译时间

change all_dynamic to dynamic (#10423)

8250384

Merge branch 'master' of https://github.com/Oneflow-Inc/oneflow into …

600f86a

…rocm

yuguo-Jack requested review from jackalcooper and liujuncheng as code owners January 31, 2024 08:12

Ldpe2G merged commit 2f8fb3b into Oneflow-Inc:rocm Jan 31, 2024
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge master #13

merge master #13

yuguo-Jack commented Jan 31, 2024

merge master #13

merge master #13

Conversation

yuguo-Jack commented Jan 31, 2024