Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fused get target_offsets #9536

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

hhhfccz
Copy link
Contributor

@hhhfccz hhhfccz commented Dec 5, 2022

@BBuf fused_get_target_offsets

那几个kernel改名我稍后也commit到这个PR下

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.0ms (= 14003.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.6ms (= 16361.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 163.6ms / 140.0ms)

OneFlow resnet50 time: 85.0ms (= 8500.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.1ms (= 10111.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 101.1ms / 85.0ms)

OneFlow resnet50 time: 57.7ms (= 11533.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15535.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.7ms / 57.7ms)

OneFlow resnet50 time: 43.9ms (= 8773.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.8ms (= 15158.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.73 (= 75.8ms / 43.9ms)

OneFlow resnet50 time: 40.0ms (= 8000.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.5ms (= 14098.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 70.5ms / 40.0ms)

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

@BBuf
Copy link
Contributor

BBuf commented Dec 14, 2022

在one-yolov5中的 Oneflow-Inc/one-yolov5#99 做精度验证

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.5ms (= 13953.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.8ms (= 15982.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 159.8ms / 139.5ms)

OneFlow resnet50 time: 84.5ms (= 8451.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 106.8ms (= 10676.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 106.8ms / 84.5ms)

OneFlow resnet50 time: 57.5ms (= 11504.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.0ms (= 17602.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 88.0ms / 57.5ms)

OneFlow resnet50 time: 43.9ms (= 8770.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.7ms (= 15949.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.82 (= 79.7ms / 43.9ms)

OneFlow resnet50 time: 39.1ms (= 7811.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.0ms (= 13601.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.74 (= 68.0ms / 39.1ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14012.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.2ms (= 16120.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 161.2ms / 140.1ms)

OneFlow resnet50 time: 85.2ms (= 8515.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.5ms (= 10252.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.20 (= 102.5ms / 85.2ms)

OneFlow resnet50 time: 57.6ms (= 11510.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.8ms (= 15568.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 77.8ms / 57.6ms)

OneFlow resnet50 time: 43.9ms (= 8771.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.8ms (= 14162.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 70.8ms / 43.9ms)

OneFlow resnet50 time: 41.5ms (= 8303.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.3ms (= 13860.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 69.3ms / 41.5ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

@hhhfccz hhhfccz removed the request for review from oneflow-ci-bot December 16, 2022 06:55
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.6ms (= 13963.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.6ms (= 16055.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.6ms / 139.6ms)

OneFlow resnet50 time: 85.0ms (= 8500.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 112.5ms (= 11251.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.32 (= 112.5ms / 85.0ms)

OneFlow resnet50 time: 57.8ms (= 11559.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 88.6ms (= 17717.2ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.53 (= 88.6ms / 57.8ms)

OneFlow resnet50 time: 45.0ms (= 9009.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.2ms (= 14246.2ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 71.2ms / 45.0ms)

OneFlow resnet50 time: 41.0ms (= 8209.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.2ms (= 13633.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.66 (= 68.2ms / 41.0ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9536/

@ccssu
Copy link
Contributor

ccssu commented Jan 3, 2023

前言

报错的日志: log-01-01-15-20.txt
版本信息: oneflow-0.8.1+cu117.git.f59f6dacbe

  1. 编译的 add_fused_get_target_offsets 训练one-yolov5过程中出现报错
  2. 存在内存泄露

1.报错信息

报错的日志: log-01-01-15-20.txt

报错信息
File "/data/dataset/fengwen/package/oneflow/oneflow/user/kernels/stateful_opkernel.cpp", line 985, in Compute
    compute_ctx->stream()->GetAsyncError()
Error Type: oneflow.ErrorProto.runtime_error
*** Check failure stack trace: ***
    @     0x7f8a22f933d3  google::LogMessage::Fail()
    @     0x7f8a22f959c4  google::LogMessage::SendToLog()
    @     0x7f8a22f92ebf  google::LogMessage::Flush()
    @     0x7f8a22f95fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f8a2e2c709c  oneflow::one::StatefulOpKernel::Compute()
    @     0x7f8a2c408fe2  oneflow::vm::OpCallInstructionUtil::Compute()
    @     0x7f8a2c405170  _ZZN7oneflow2vm23OpCallInstructionPolicy7ComputeEPNS0_11InstructionEENKUlPKcE_clES5_.isra.0.constprop.0
    @     0x7f8a2c405691  oneflow::vm::OpCallInstructionPolicy::Compute()
    @     0x7f8a2c402399  oneflow::vm::EpStreamPolicyBase::Run()
    @     0x7f8a2c40ce6d  oneflow::vm::StreamPolicy::RunIf()
    @     0x7f8a2c4112a5  oneflow::vm::ThreadCtx::TryReceiveAndRun()
    @     0x7f8a2c412dad  oneflow::(anonymous namespace)::WorkerLoop()
    @     0x7f8a2c4134a9  _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvPN7oneflow2vm9ThreadCtxERKSt8functionIFvS6_EEES6_ZNS3_14VirtualMachine15CreateThreadCtxENS3_6SymbolINS3_6DeviceEEENS3_10StreamTypeEmEUlS6_E3_EEEEE6_M_runEv
    @     0x7f8a22e67de4  (unknown)
    @     0x7f8a4abfd609  start_thread
    @     0x7f8a4ab22133  clone
Traceback (most recent call last):
  File "/data/dataset/fengwen/miniconda3/envs/python3.8/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/dataset/fengwen/miniconda3/envs/python3.8/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/dataset/fengwen/package/oneflow/python/oneflow/distributed/launch.py", line 240, in <module>
    main()
  File "/data/dataset/fengwen/package/oneflow/python/oneflow/distributed/launch.py", line 228, in main
    sigkill_handler(signal.SIGTERM, None)
  File "/data/dataset/fengwen/package/oneflow/python/oneflow/distributed/launch.py", line 196, in sigkill_handler
    raise subprocess.CalledProcessError(
subprocess.CalledProcessError: Command '['/data/dataset/fengwen/miniconda3/envs/python3.8/bin/python', '-u', 'train.py', '--data', 'data/coco.yaml', '--weights', ' ', '--cfg', 'models/yolov5n.yaml', '--batch', '128', '--bbox_iou_optim', '--multi_tensor_optimizer', '--build_targets_optim']' died with <Signals.SIGABRT: 6>.
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
Killing subprocess 2504434
Killing subprocess 2504435
Killing subprocess 2504436
Killing subprocess 2504437
F20230101 20:01:31.427902 2504469 ctrl_client.cpp:53] Check failed: rpc_client_.GetStubAt(i)->CallMethod<CtrlMethod::kLoadServer>( &client_ctx, request, &response).error_code() == grpc::StatusCode::OK (14 vs. 0) Machine 2 lost
*** Check failure stack trace: ***
    @     0x7fc8ad9803d3  google::LogMessage::Fail()
    @     0x7fc8ad9829c4  google::LogMessage::SendToLog()
    @     0x7fc8ad97febf  google::LogMessage::Flush()
    @     0x7fc8ad982fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fc8b22e1e13  _ZZN7oneflow14GrpcCtrlClientC4ERKNS_10ProcessCtxEENKUlvE_clEv.isra.0
    @     0x7fc8ad854de4  (unknown)
    @     0x7fc8d55ea609  start_thread
    @     0x7fc8d550f133  clone
F20230101 20:01:32.427429 2504468 ctrl_client.cpp:53] Check failed: rpc_client_.GetStubAt(i)->CallMethod<CtrlMethod::kLoadServer>( &client_ctx, request, &response).error_code() == grpc::StatusCode::OK (14 vs. 0) Machine 2 lost
*** Check failure stack trace: ***
    @     0x7faf5e9503d3  google::LogMessage::Fail()
    @     0x7faf5e9529c4  google::LogMessage::SendToLog()
    @     0x7faf5e94febf  google::LogMessage::Flush()
    @     0x7faf5e952fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7faf632b1e13  _ZZN7oneflow14GrpcCtrlClientC4ERKNS_10ProcessCtxEENKUlvE_clEv.isra.0
    @     0x7faf5e824de4  (unknown)
    @     0x7faf865ba609  start_thread
    @     0x7faf864df133  clone
F20230101 20:01:34.464609 2504466 ctrl_client.cpp:53] Check failed: rpc_client_.GetStubAt(i)->CallMethod<CtrlMethod::kLoadServer>( &client_ctx, request, &response).error_code() == grpc::StatusCode::OK (14 vs. 0) Machine 0 lost
*** Check failure stack trace: ***
    @     0x7f2c0c39e3d3  google::LogMessage::Fail()
    @     0x7f2c0c3a09c4  google::LogMessage::SendToLog()
    @     0x7f2c0c39debf  google::LogMessage::Flush()
    @     0x7f2c0c3a0fbf  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f2c10cffe13  _ZZN7oneflow14GrpcCtrlClientC4ERKNS_10ProcessCtxEENKUlvE_clEv.isra.0
    @     0x7f2c0c272de4  (unknown)
    @     0x7f2c34008609  start_thread
    @     0x7f2c33f2d133  clone

2. 内存泄露

版本号: oneflow-0.8.1+cu117.git.f59f6dacbe
wandb训练日志: https://wandb.ai/wearmheart/YOLOv5/runs/25ue7t6a/system?workspace=user-wearmheart
image

版本信息

  • 机器 oneflow 27-root
  • YOLOv5 v1.1.0-18-ga217cd39 Python-3.8.13 oneflow-0.8.1+cu117.git.f59f6dacbe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants