Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only inferring a single image frame once for a huge performance boost… #301

Closed
wants to merge 1 commit into from

Conversation

tomlankhorst
Copy link
Contributor

Author: @zoombinis

When I first used darknet_ros, I discovered a single image frame from a video camera was being used as the same input for inference redundantly up to 70 times on my NVidia Quadro RTX 3000!

So to use far less processing resources unnecessarily, this PR simply keeps track of the last consumed image frame ID to ensure it doesn't infer with it again.

This resolves these issues:
#150
ZhiangChen/target_mapping#8

@tomlankhorst
Copy link
Contributor Author

@zoombinis Some tests fail: https://github.com/leggedrobotics/darknet_ros/pull/301/checks?check_run_id=1893822361
The build status is 'Success' but that's because the GitHub action didn't recognize the failure yet.

…. Before the same image would run inference multiple times redundantly hogging resources
@tomlankhorst tomlankhorst force-pushed the feature/zoombinis-infer-single-image branch from 365b80b to 02f2170 Compare February 13, 2021 15:01
@ghost
Copy link

ghost commented Feb 15, 2021

@tomlankhorst Should the unit test be setting a header on the image message (specifying timestamp, index especially)?

This change relies on the image having an index, which they always should, so it could just be due to the test setup. I'm away from my development environment for a while, sorry I can't verify right now.

Copy link

@Zorbing Zorbing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed changes do improve the performance, but the results are no longer accurate all the time.

Because of multithreading, the buffer works as follows:

  • (buffIndex_ + 3) % 3 (also short buffIndex_) is currently being fetched in the fetch_thread
  • (buffIndex_ + 2) % 3 is currently being processed in detect_thread
  • (buffIndex_ + 1) % 3 is a completely processed frame with detections from the previous loop iteration

When a new image has been fetched ("new" as in has a new sequence number) that needs to be processed by the detection thread, the last image is still the old image ("old" as in with the old sequence number) which did not get any detection.
Because when prevSeq_ != headerBuff_[(buffIndex_ + 2) % 3].seq yields true, the publishInThread function uses information from the previous iteration (from buff_[(buffIndex_ + 1) % 3]).
Therefore it takes three iterations before an image is fully processed and detections are published:

  1. fetching the image
  2. detecting objects
  3. publishing the bounding boxes and the detection image

Either the publishing is done right after detect_thread.join(), or it needs to wait until the next iteration, which is suggested in the following changes:

// Fix0: keep track of what std_msgs::Header id this is (consecutively increasing)
std::uint32_t prevSeq_ = 0;
bool newImageForDetection = false;
bool hasDetectionsReady = false;
while (!demoDone_)
{
  buffIndex_ = (buffIndex_ + 1) % 3;
  // Fix1: check this isn't an image already seen
  newImageForDetection = (prevSeq_ != headerBuff_[(buffIndex_ + 2) % 3].seq);

  fetch_thread = std::thread(&YoloObjectDetector::fetchInThread, this);
  if (newImageForDetection)
  {
    // Fix2: only detect if this is an image we haven't see before
    detect_thread = std::thread(&YoloObjectDetector::detectInThread, this);
  }

  // only publish bounding boxes if detection has been done in the last iteration
  if (hasDetectionsReady)
  {
    if (!demoPrefix_)
    {
      fps_ = 1. / (what_time_is_it_now() - demoTime_);
      demoTime_ = what_time_is_it_now();
      if (viewImage_)
      {
        displayInThread(0);
      }
      else
      {
        generate_image(buff_[(buffIndex_ + 1) % 3], ipl_);
      }
      publishInThread();
    }
    else
    {
      char name[256];
      sprintf(name, "%s_%08d", demoPrefix_, count);
      save_image(buff_[(buffIndex_ + 1) % 3], name);
      ++count;
    }
    // state that the image has been published
    hasDetectionsReady = false;
  }

  fetch_thread.join();
  if (newImageForDetection)
  {
    // Fix3: increment the new sequence number to avoid detecting more than once
    prevSeq_ = headerBuff_[(buffIndex_ + 2) % 3].seq;
    // Fix4: no detection made, so let thread execution complete so that it can be destroyed safely
    detect_thread.join();
    // only after the detect thread is joined, set this flag to true
    hasDetectionsReady = true;
  }
  if (!isNodeRunning())
  {
    demoDone_ = true;
  }
}

@ghost
Copy link

ghost commented Mar 21, 2021

@tomlankhorst I got tests passing and even included @Zorbing improvement.

As predicted, the issue was due to missing header information. Tests are now correctly passing that information through. Here is my commit diff:

feature/zoombinis-infer-single-image...zoombinis:feature/zoombinis-infer-single-image

However, there's still some inaccuracy (<300%) since we're no longer iterating detection on the same image. If you look at the output it's not unacceptable by some standards - good enough for me :) at least

good1
good2

@Zorbing
Copy link

Zorbing commented Mar 22, 2021

I don't think you can change the source branch of a PR (the target branch works though). So you probably have to create a new PR and link and close this one.

@ghost ghost mentioned this pull request Mar 22, 2021
@ghost
Copy link

ghost commented Mar 22, 2021

@tomlankhorst I don't have the ability to close but here is a new PR: #305

@tomlankhorst
Copy link
Contributor Author

Thanks for the new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants