Skip to content

Host-Device Unnecessary Copying #5483

Answered by abadams
WillianJunior asked this question in Q&A
Discussion options

You must be logged in to vote

I don't understand why you have B2.copy_to_device just before inspecting B2 on CPU. Is that supposed to be B2.copy_to_host?

I'm not seeing any unusual copies when I attempt to reproduce this. Can you post a full repro? Here's what I have:

#include "Halide.h"

using namespace Halide;

int main(int argc, char **argv) {
    Target target("host-cuda-debug");

    Buffer<int> B1(128, 128);

    Func f, g;
    Var x, y;

    f(x, y) = undef<int>();
    f(x, y) = x + y;

    g(x, y) = 0;
    g(x, y) += B1(x, y);

    Var xi, yi;
    f.compute_root()
        .update()
        .gpu_tile(x, y, xi, yi, 16, 16);
    g.compute_root()
        .gpu_tile(x, y, xi, yi, 16, 16)
        .update()
        .g…

Replies: 3 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@WillianJunior
Comment options

Answer selected by WillianJunior
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@WillianJunior
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants