Xnnpack backend support #159

chenghuaWang · 2024-10-11T06:58:46Z

!!!Do not merge until xnnpack backend llama is runable!!!

How to use xnnpack backend in mllm

The xnnpack backend in MLLM offers a convenient wrapper function designed to convert a standard CPU-based MLLM module into one that utilizes the xnnpack backend. This function, wrap2xnn, accepts parameters such as inputs_nums, outputs_nums, and any other arguments required for the construction of a LinearModule. For a clearer understanding, please refer to the example provided below:

E.g.:

class LinearModule : public Module {
    Layer linear;

public:
    LinearModule() {
        linear = Linear(1024, 2048, true, "linear");
    }

    vector<Tensor> Forward(vector<Tensor> inputs, vector<std::any> args) override {
        auto x = inputs[0];
        auto out = linear(x);
        return {out};
    }
};

TEST(XpLinearTest, LinearModule) {
    mllm::xnnpack::Log::log_level = mllm::xnnpack::Log::ERROR;

    auto model = ::mllm::xnnpack::wrap2xnn<LinearModule>(1, 1);
    model.setNoLoadWeightsDtype(DataType::MLLM_TYPE_F32);

    EXPECT_EQ(Backend::global_backends[MLLM_XNNPACK] != nullptr, true);

    Tensor x(1, 1, 256, 1024, Backend::global_backends[MLLM_XNNPACK], true);
    x.setTtype(TensorType::INPUT_TENSOR);

    for (int i = 0; i < 256 * 1024; ++i) {
        *(x.hostPtr<float>() + i) = 1024.f;
    }

    auto out = model({x})[0];

    for (int i = 0; i < 256 * 2048; ++i) {
        EXPECT_EQ(*(out.hostPtr<float>() + i) < 1e-18, true);
    }

    out.printShape();
}

Unlike the dynamic graph mode in MLLM, xnnpack operates on a static graph. This necessitates a mechanism to convert from a dynamic graph to a static graph. The xnnpack backend wrapper in MLLM will add several layers on top of the LinearModule to register input external and output external Tensors. The final wrapped module, as shown in the following pseudocode:

Layer: Direct(Direct::ExternalInput)
Module: LinearModule()
Layer: Direct(Direct::ExternalOutput)
Layer: Dispatch()

You can find more use cases in https://github.com/chenghuaWang/mllm/blob/main/test/xnnpack/

How are the operators in MLLM's xnnpack backend implemented?

Take XpAdd operation as an example:

The XpAdd‘s reshape function is identical to that of CPUAdd. The main differences lie in the setUp and execute functions.

Upon calling execute, XpAdd will integrate a static graph node into the xnnpack subgraph. However, XpAdd performs no actions during the setUp phase. This is because, during the setUp stage, we need to allow the XpDirect Op to determine whether the Tensor is an external input, external output, or a regular tensor.

ErrorCode XpAdd::execute(vector<shared_ptr<Tensor>> inputs, vector<shared_ptr<Tensor>> outputs) {
    auto xpb = (XnnpackBackend *)backend();
    tryDefineAllXpTensors(xpb, inputs);
    tryDefineAllXpTensors(xpb, outputs);

    // define xnnpack op.
    auto status = xnn_define_binary(
        xpb->getXnnSubgraph(),
        xnn_binary_add,
        nullptr,
        inputs[0]->uuid(),
        inputs[1]->uuid(),
        outputs[0]->uuid(),
        0);

    if (status != xnn_status_success) {
        Log::error("XpAdd::execute Error");
        exit(-1);
    }

    return MLLM_NO_ERROR;
}

…MLLM_TEST.

chenghuaWang added 30 commits October 9, 2024 09:03

feat: XpOps, XpDirect.

59cdc15

feat: Xnnpack Add Example

8fbd7df

feat: mllm frontend -> xnn static graph

7024eab

fix: Add Example Done.

a3f271c

feat: xnnpack wrap

9c59cd0

fix: include path, update xnnpack to latest

c4bad7b

feat: xnn backend element wise op function

0a6f706

feat: xnn weight register and linear op

f156f35

fix: XpLinear error with NoLoadWeightsDtype

efb3fa4

feat: xnnpack matmul rope

5781312

feat: fix redefine tensor in xnnpack bug

72dfb7a

feat: add relu and rope bug fix

4ad8d31

feat: xnnpack GELU, Softmax, SiLU impl

ad3b148

feat: rms norm, tranpose

886eba7

feat: kvcache, still buggy, rfc.

b37ad92

feat: update 3rd party packages

90164da

feat: xp kvcache fix.

b0883bb

fix: github action main.yml

05f167b

feat: !!!SDPA!!! (Support B, H, S, D) layout

69e80f9

feat: XpSDPA torch impl for check mllm's correctness

a7471be

fix: XpRoPE, add view func

4a95180

fix: rope test example

a34240f

fix: transpose xnn example bugs find

5b4bd71

fix: xnnpack uuid register bug

314ca42

fix: xnnpack uuid register bug

bea70c1

fix: rope xnnpack test error

063f37f

feat: matmul xnnpack, failed at stl malloc.

3b75a25

fix: xnnpack illegal memory r/w by using valgrind.

2945423

fix: xnnpack attention impl bug

d778974

feat: XpEmbedding op

dbb642b

chenghuaWang and others added 10 commits October 29, 2024 03:20

feat: QWen version 1.5 0.5B and 1.8B xnnpack backend.

6b185ec

Merge branch 'main' into main

1bc39c2

fix: xnnpack backend rope

5e9345e

feat: reduce memory load time in xnnpack

cc27a58

fix: xnnpack qwen example token backend

11cfa6f

Merge branch 'UbiquitousLearning:main' into main

8c0c300

fix: use_layername_2_tensorname

f77622f

fix: MLLM_BUILD_XNNPACK OFF error and redundant xnnpack exec targets

d3d9211

Merge branch 'main' of https://github.com/chenghuaWang/mllm

2cd51cf

Merge branch 'main' into main

172bbd9

yirongjie self-requested a review October 29, 2024 16:14

chenghuaWang and others added 17 commits October 30, 2024 02:17

fix: remove memory test due to previous workflow.yaml remove it from …

757905e

…MLLM_TEST.

fix: mask init error in xnnpack

c5302e0

Merge branch 'UbiquitousLearning:main' into main

7f38bdf

Merge branch 'main' into main

20393cc

Merge branch 'main' into main

b43180e

fix: merge tokenize error

da3c295

fix: tokenizer apply

9eb944c

fix: change HardSwish to original Swish function

779207b

fix: mask bug in xnnpack

458a1e3

fix: tokenizer.tokenize

fd1a449

fix: remove unused

1d62462

update: move Xp*Test to MLLM_TEST

211ad56

Merge branch 'main' of https://github.com/chenghuaWang/mllm

5ceadc3

fix: test bug

ddddab3

fix: xnnpack test setup

919520f

fix: XpTest Error

99f2ebc

fix: set xnn default threads to 4

b829c24

yirongjie approved these changes Oct 31, 2024

View reviewed changes

yirongjie merged commit fb4203a into UbiquitousLearning:main Oct 31, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xnnpack backend support #159

Xnnpack backend support #159

chenghuaWang commented Oct 11, 2024

Xnnpack backend support #159

Xnnpack backend support #159

Conversation

chenghuaWang commented Oct 11, 2024

How to use xnnpack backend in mllm

How are the operators in MLLM's xnnpack backend implemented?