add fused get target_offsets #9536

hhhfccz · 2022-12-05T13:10:55Z

@BBuf fused_get_target_offsets

那几个kernel改名我稍后也commit到这个PR下

github-actions · 2022-12-06T05:54:29Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.0ms (= 14003.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.6ms (= 16361.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 163.6ms / 140.0ms)

OneFlow resnet50 time: 85.0ms (= 8500.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.1ms (= 10111.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.1ms / 85.0ms)

OneFlow resnet50 time: 57.7ms (= 11533.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15535.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.7ms / 57.7ms)

OneFlow resnet50 time: 43.9ms (= 8773.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.8ms (= 15158.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.73 (= 75.8ms / 43.9ms)

OneFlow resnet50 time: 40.0ms (= 8000.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.5ms (= 14098.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 70.5ms / 40.0ms)

github-actions · 2022-12-06T06:01:40Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

oneflow/core/functional/functional_api.yaml

oneflow/user/kernels/fused_get_bounding_boxes_coord_kernel.cu

BBuf · 2022-12-14T14:27:09Z

在one-yolov5中的 Oneflow-Inc/one-yolov5#99 做精度验证

github-actions · 2022-12-14T16:07:40Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.5ms (= 13953.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.8ms (= 15982.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 159.8ms / 139.5ms)

OneFlow resnet50 time: 84.5ms (= 8451.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 106.8ms (= 10676.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 106.8ms / 84.5ms)

OneFlow resnet50 time: 57.5ms (= 11504.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.0ms (= 17602.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 88.0ms / 57.5ms)

OneFlow resnet50 time: 43.9ms (= 8770.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.7ms (= 15949.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.82 (= 79.7ms / 43.9ms)

OneFlow resnet50 time: 39.1ms (= 7811.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13601.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 68.0ms / 39.1ms)

github-actions · 2022-12-14T16:12:52Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

…-Inc/oneflow into fused_get_target_offsets

github-actions · 2022-12-15T08:49:22Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14012.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.2ms (= 16120.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 161.2ms / 140.1ms)

OneFlow resnet50 time: 85.2ms (= 8515.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.5ms (= 10252.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.5ms / 85.2ms)

OneFlow resnet50 time: 57.6ms (= 11510.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.8ms (= 15568.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.8ms / 57.6ms)

OneFlow resnet50 time: 43.9ms (= 8771.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.8ms (= 14162.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 70.8ms / 43.9ms)

OneFlow resnet50 time: 41.5ms (= 8303.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.3ms (= 13860.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 69.3ms / 41.5ms)

github-actions · 2022-12-15T09:02:00Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

python/oneflow/test/modules/test_fused_yolov5_get_target_offsets.py

oneflow/user/ops/fused_yolov5_get_target_offsets_op.cpp

oneflow/user/kernels/fused_yolov5_get_target_offsets_kernel.cu

github-actions · 2022-12-16T11:23:00Z

Speed stats:

GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.6ms (= 13963.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.6ms (= 16055.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.6ms / 139.6ms)

OneFlow resnet50 time: 85.0ms (= 8500.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.5ms (= 11251.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 112.5ms / 85.0ms)

OneFlow resnet50 time: 57.8ms (= 11559.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.6ms (= 17717.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 88.6ms / 57.8ms)

OneFlow resnet50 time: 45.0ms (= 9009.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14246.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 71.2ms / 45.0ms)

OneFlow resnet50 time: 41.0ms (= 8209.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.2ms (= 13633.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 68.2ms / 41.0ms)

github-actions · 2022-12-16T11:34:54Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

ccssu · 2023-01-03T02:58:38Z

前言

报错的日志: log-01-01-15-20.txt
版本信息: oneflow-0.8.1+cu117.git.f59f6dacbe

编译的 add_fused_get_target_offsets 训练one-yolov5过程中出现报错
存在内存泄露

1.报错信息

报错的日志: log-01-01-15-20.txt

报错信息

File "/data/dataset/fengwen/package/oneflow/oneflow/user/kernels/stateful_opkernel.cpp", line 985, in Compute
    compute_ctx->stream()->GetAsyncError()
Error Type: oneflow.ErrorProto.runtime_error
*** Check failure stack trace: ***
    @     0x7f8a22f933d3  google::LogMessage::Fail()
    @     0x7f8a22f959c4  google::LogMessage::SendToLog()
    @     0x7f8a22f92ebf  google::LogMessage::Flush()
    @     0x7f8a22f95fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f8a2e2c709c  oneflow::one::StatefulOpKernel::Compute()
    @     0x7f8a2c408fe2  oneflow::vm::OpCallInstructionUtil::Compute()
    @     0x7f8a2c405170  _ZZN7oneflow2vm23OpCallInstructionPolicy7ComputeEPNS0_11InstructionEENKUlPKcE_clES5_.isra.0.constprop.0
    @     0x7f8a2c405691  oneflow::vm::OpCallInstructionPolicy::Compute()
    @     0x7f8a2c402399  oneflow::vm::EpStreamPolicyBase::Run()
    @     0x7f8a2c40ce6d  oneflow::vm::StreamPolicy::RunIf()
    @     0x7f8a2c4112a5  oneflow::vm::ThreadCtx::TryReceiveAndRun()
    @     0x7f8a2c412dad  oneflow::(anonymous namespace)::WorkerLoop()
    @     0x7f8a2c4134a9  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN7oneflow2vm9ThreadCtxERKSt8functionIFvS6_EEES6_ZNS3_14VirtualMachine15CreateThreadCtxENS3_6SymbolINS3_6DeviceEEENS3_10StreamTypeEmEUlS6_E3_EEEEE6_M_runEv
    @     0x7f8a22e67de4  (unknown)
    @     0x7f8a4abfd609  start_thread
    @     0x7f8a4ab22133  clone
Traceback (most recent call last):
  File "/data/dataset/fengwen/miniconda3/envs/python3.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/dataset/fengwen/miniconda3/envs/python3.8/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/dataset/fengwen/package/oneflow/python/oneflow/distributed/launch.py", line 240, in <module>
    main()
  File "/data/dataset/fengwen/package/oneflow/python/oneflow/distributed/launch.py", line 228, in main
    sigkill_handler(signal.SIGTERM, None)
  File "/data/dataset/fengwen/package/oneflow/python/oneflow/distributed/launch.py", line 196, in sigkill_handler
    raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command '['/data/dataset/fengwen/miniconda3/envs/python3.8/bin/python', '-u', 'train.py', '--data', 'data/coco.yaml', '--weights', ' ', '--cfg', 'models/yolov5n.yaml', '--batch', '128', '--bbox_iou_optim', '--multi_tensor_optimizer', '--build_targets_optim']' died with <Signals.SIGABRT: 6>.
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Killing subprocess 2504434
Killing subprocess 2504435
Killing subprocess 2504436
Killing subprocess 2504437
F20230101 20:01:31.427902 2504469 ctrl_client.cpp:53] Check failed: rpc_client_.GetStubAt(i)->CallMethod<CtrlMethod::kLoadServer>( &client_ctx, request, &response).error_code() == grpc::StatusCode::OK (14 vs. 0) Machine 2 lost
*** Check failure stack trace: ***
    @     0x7fc8ad9803d3  google::LogMessage::Fail()
    @     0x7fc8ad9829c4  google::LogMessage::SendToLog()
    @     0x7fc8ad97febf  google::LogMessage::Flush()
    @     0x7fc8ad982fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc8b22e1e13  _ZZN7oneflow14GrpcCtrlClientC4ERKNS_10ProcessCtxEENKUlvE_clEv.isra.0
    @     0x7fc8ad854de4  (unknown)
    @     0x7fc8d55ea609  start_thread
    @     0x7fc8d550f133  clone
F20230101 20:01:32.427429 2504468 ctrl_client.cpp:53] Check failed: rpc_client_.GetStubAt(i)->CallMethod<CtrlMethod::kLoadServer>( &client_ctx, request, &response).error_code() == grpc::StatusCode::OK (14 vs. 0) Machine 2 lost
*** Check failure stack trace: ***
    @     0x7faf5e9503d3  google::LogMessage::Fail()
    @     0x7faf5e9529c4  google::LogMessage::SendToLog()
    @     0x7faf5e94febf  google::LogMessage::Flush()
    @     0x7faf5e952fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7faf632b1e13  _ZZN7oneflow14GrpcCtrlClientC4ERKNS_10ProcessCtxEENKUlvE_clEv.isra.0
    @     0x7faf5e824de4  (unknown)
    @     0x7faf865ba609  start_thread
    @     0x7faf864df133  clone
F20230101 20:01:34.464609 2504466 ctrl_client.cpp:53] Check failed: rpc_client_.GetStubAt(i)->CallMethod<CtrlMethod::kLoadServer>( &client_ctx, request, &response).error_code() == grpc::StatusCode::OK (14 vs. 0) Machine 0 lost
*** Check failure stack trace: ***
    @     0x7f2c0c39e3d3  google::LogMessage::Fail()
    @     0x7f2c0c3a09c4  google::LogMessage::SendToLog()
    @     0x7f2c0c39debf  google::LogMessage::Flush()
    @     0x7f2c0c3a0fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f2c10cffe13  _ZZN7oneflow14GrpcCtrlClientC4ERKNS_10ProcessCtxEENKUlvE_clEv.isra.0
    @     0x7f2c0c272de4  (unknown)
    @     0x7f2c34008609  start_thread
    @     0x7f2c33f2d133  clone

2. 内存泄露

版本号: oneflow-0.8.1+cu117.git.f59f6dacbe
wandb训练日志: https://wandb.ai/wearmheart/YOLOv5/runs/25ue7t6a/system?workspace=user-wearmheart

版本信息

机器 oneflow 27-root
YOLOv5 v1.1.0-18-ga217cd39 Python-3.8.13 oneflow-0.8.1+cu117.git.f59f6dacbe

add fused get target_offsets

311eea2

hhhfccz added feature op labels Dec 5, 2022

hhhfccz requested review from hjchen2, BBuf, jackalcooper, daquexian and liujuncheng as code owners December 5, 2022 13:10

hhhfccz added 2 commits December 6, 2022 02:27

fix: rename to yolov5

4270247

format

8eb2e9d

hhhfccz requested a review from oneflow-ci-bot December 6, 2022 02:30

Merge branch 'master' into fused_get_target_offsets

84e2ec7

BBuf mentioned this pull request Dec 14, 2022

add build_targets_optim Oneflow-Inc/one-yolov5#99

Open

BBuf reviewed Dec 14, 2022

View reviewed changes

oneflow/core/functional/functional_api.yaml Outdated Show resolved Hide resolved

BBuf reviewed Dec 14, 2022

View reviewed changes

oneflow/user/kernels/fused_get_bounding_boxes_coord_kernel.cu Outdated Show resolved Hide resolved

hhhfccz added 2 commits December 15, 2022 05:47

rename to yolov5

dab7fc6

Merge branch 'fused_get_target_offsets' of https://github.com/Oneflow…

a17a047

…-Inc/oneflow into fused_get_target_offsets

BBuf reviewed Dec 16, 2022

View reviewed changes

python/oneflow/test/modules/test_fused_yolov5_get_target_offsets.py Outdated Show resolved Hide resolved

BBuf reviewed Dec 16, 2022

View reviewed changes

oneflow/user/ops/fused_yolov5_get_target_offsets_op.cpp Outdated Show resolved Hide resolved

BBuf reviewed Dec 16, 2022

View reviewed changes

oneflow/user/ops/fused_yolov5_get_target_offsets_op.cpp Show resolved Hide resolved

BBuf reviewed Dec 16, 2022

View reviewed changes

oneflow/user/kernels/fused_yolov5_get_target_offsets_kernel.cu Show resolved Hide resolved

fix: init for first row

f59f6da

hhhfccz removed the request for review from oneflow-ci-bot December 16, 2022 06:55

hhhfccz requested a review from oneflow-ci-bot December 16, 2022 06:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fused get target_offsets #9536

add fused get target_offsets #9536

hhhfccz commented Dec 5, 2022

github-actions bot commented Dec 6, 2022

github-actions bot commented Dec 6, 2022

BBuf commented Dec 14, 2022

github-actions bot commented Dec 14, 2022

github-actions bot commented Dec 14, 2022

github-actions bot commented Dec 15, 2022

github-actions bot commented Dec 15, 2022

github-actions bot commented Dec 16, 2022

github-actions bot commented Dec 16, 2022

ccssu commented Jan 3, 2023

add fused get target_offsets #9536

Are you sure you want to change the base?

add fused get target_offsets #9536

Conversation

hhhfccz commented Dec 5, 2022

github-actions bot commented Dec 6, 2022

github-actions bot commented Dec 6, 2022

BBuf commented Dec 14, 2022

github-actions bot commented Dec 14, 2022

github-actions bot commented Dec 14, 2022

github-actions bot commented Dec 15, 2022

github-actions bot commented Dec 15, 2022

github-actions bot commented Dec 16, 2022

github-actions bot commented Dec 16, 2022

ccssu commented Jan 3, 2023

前言

1.报错信息

2. 内存泄露

版本信息