CPU GEMM in float 16 fails silently with NaiveEngine (CPP api) #15862

marekjg · 2019-08-12T12:04:08Z

marekjg
Aug 12, 2019

Description

When running inference in cpp api with naive engine, the computation is not performed but the output array is initialized and no exception is raised. Without naive engine, there's exception thrown (which is expected). Using python api with naive engine will raise exception in both cases so the problem seems to be related to CPP api. It applies to Dense layer but also happend with model with RNN layers so my guess is that whenever GEMM is used, it fails.

Environment info (Required)

----------Python Info----------
('Version      :', '2.7.12')
('Compiler     :', 'GCC 5.4.0 20160609')
('Build        :', ('default', 'Nov 12 2018 14:36:49'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '8.1.1')
('Directory    :', '/usr/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
('Platform     :', 'Linux-4.4.0-157-generic-x86_64-with-Ubuntu-16.04-xenial')
('system       :', 'Linux')
('node         :', 'devdesktop')
('release      :', '4.4.0-157-generic')
('version      :', '#185-Ubuntu SMP Tue Jul 23 09:17:01 UTC 2019')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Stepping:              3
CPU MHz:               3700.820
CPU max MHz:           4000.0000
CPU min MHz:           800.0000
BogoMIPS:              6816.62
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ssbd ibrs ibpb stibp pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0002 sec, LOAD: 0.9854 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0610 sec, LOAD: 1.2456 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0714 sec, LOAD: 0.5382 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0606 sec, LOAD: 0.4074 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0909 sec, LOAD: 1.0720 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0736 sec, LOAD: 0.9317 sec.

Build info (Required if built from source)

gcc

MXNet commit hash:
57927a9

Build config:
cmake -DUSE_CUDA=OFF -DUSE_CPP_PACKAGE=1 -GNinja .. && ninja

Error Message:

With NaiveEngine:

[13:44:58] ../src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.3.1. Attempting to upgrade...
[13:44:58] ../src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
[13:44:58] ../src/engine/engine.cc:55: MXNet start using engine: NaiveEngine

Without NaiveEngine:

[13:44:54] ../src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.3.1. Attempting to upgrade...
[13:44:54] ../src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded!
terminate called after throwing an instance of 'dmlc::Error'
  what():  [13:44:54] ./mxnet-cpp/ndarray.hpp:242: Check failed: MXNDArrayWaitAll() == 0 (-1 vs. 0) : [13:44:54] ../src/operator/nn/./fully_connected-inl.h:212: float16 fully connected layer is currentlyonly supported by CuDNN version.
Stack trace:
  [bt] (0) ./test_cpp(dmlc::LogMessageFatal::~LogMessageFatal()+0x34) [0x407118]
  [bt] (1) libmxnet.so(void mxnet::op::FullyConnectedCompute<mshadow::cpu>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x218) [0x7fed1cd276e8]
  [bt] (2) libmxnet.so(mxnet::op::FullyConnectedComputeExCPU(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x854) [0x7fed1cd1ee34]
  [bt] (3) libmxnet.so(+0x83395d) [0x7fed1c73395d]
  [bt] (4) libmxnet.so(+0x833a0f) [0x7fed1c733a0f]
  [bt] (5) libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x500) [0x7fed1c7169e0]
  [bt] (6) libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0xfe) [0x7fed1c716ffe]
  [bt] (7) libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptr<dmlc::ManualEvent>)> (std::shared_ptr<dmlc::ManualEvent>)> >::_M_run()+0x4a) [0x7fed1c7156aa]
  [bt] (8) /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80) [0x7fed1bc36c80]


Stack trace:
  [bt] (0) ./test_cpp(dmlc::LogMessageFatal::~LogMessageFatal()+0x34) [0x407118]
  [bt] (1) ./test_cpp() [0x40bdb0]
  [bt] (2) ./test_cpp() [0x40599f]
  [bt] (3) ./test_cpp() [0x405f86]
  [bt] (4) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fed1b2b5830]
  [bt] (5) ./test_cpp() [0x405119]

Minimum reproducible example and steps to reproduce

Create the model on the machine with GPU (linear-symbol.json and linear-0000.params files) with the following script:

import mxnet as mx
from mxnet import nd
from mxnet.gluon import nn

ctx = mx.gpu() # fails when mx.cpu()
linear = nn.Dense(128)
linear.initialize(ctx=ctx)
linear.cast('float16')
x = nd.random_normal(shape=(1, 256), ctx=ctx, dtype='float16')
linear(x)
linear.export('linear')

run the inference with the following cpp program:

#include <mxnet-cpp/MxNetCpp.h>
#include <map>

using namespace mxnet::cpp;
using namespace std;


void set_input(NDArray &a)
{
	vector<float> data(a.Size());
	for(int i = 0; i < data.size(); ++i) {
	  data[i] = rand() / float(RAND_MAX);
	}
	a.SyncCopyFromCPU(data.data(), data.size());
}

int main()
{
  auto ctx = Context::cpu();
  auto net = Symbol::Load("linear-symbol.json");
  std::map<std::string, NDArray> args_map, parameters;

  NDArray::Load("linear-0000.params", 0, &parameters);
  for (const auto &k : parameters) {
    if (k.first.substr(0, 4) == "arg:") {
      auto name = k.first.substr(4, k.first.size() - 4);
      args_map[name] = k.second.Copy(ctx);
    }
  }

  Shape data_shape(1, 256);
  NDArray data(data_shape, ctx, false, 2);
  args_map["data"] = data;
  NDArray::WaitAll();

  Executor *e = net.SimpleBind(ctx, args_map);
  set_input(e->arg_dict()["data"]);
  e->Forward(false);
  vector<float> destination(e->outputs[0].Size());
  NDArray::WaitAll();

  return 0;
}

Steps to reproduce

Create the model as described above
Compile cpp program with gcc
Run the cpp program with NaiveEngine: MXNET_ENGINE_TYPE="NaiveEngine" ./test and observe no error
Run the cpp program with default engine: ./test and observe the error with message that fp16 fully connected layer is not supported in CPU

mxnet-label-bot · 2019-08-12T12:04:12Z

mxnet-label-bot
Aug 12, 2019

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

0 replies

pengzhao-intel · 2019-08-12T13:31:31Z

pengzhao-intel
Aug 12, 2019
Collaborator

FP16 in CPU is not supported now (FYI, OpenMathLib/OpenBLAS#694). But it should be a problem for the inconsistent error message.
We will improve the error message at least.
@wuxun-zhang please help to take a look.

0 replies

vdantu · 2019-08-12T17:33:08Z

vdantu
Aug 12, 2019

@pengzhao-intel : Thanks for a quick response :)

@mxnet-label-bot add [question]

0 replies

wuxun-zhang · 2019-08-13T07:49:29Z

wuxun-zhang
Aug 13, 2019

@marekjg I can reproduce this issue from my side when MXNET_ENGINE_TYPE=NaiveEngine. But I try to use std::cout instead of LOG(FATAL), it will output error messages normally. So maybe there have inconsistent logging behavior between NaiveEngine and default engine for C++ API.

0 replies

marekjg · 2019-08-13T10:58:52Z

marekjg
Aug 13, 2019
Author

@pengzhao-intel I know it is not supported, problem is from the user point of view with cpp api and cpu device as it seems that everything worked just fine (the output array is created).

0 replies

pengzhao-intel · 2019-08-13T12:37:29Z

pengzhao-intel
Aug 13, 2019
Collaborator

@pengzhao-intel I know it is not supported, problem is from the user point of view with cpp api and cpu device as it seems that everything worked just fine (the output array is created).

yes, it's a problem. Will fix it.

0 replies

pengzhao-intel · 2019-08-13T12:38:17Z

pengzhao-intel
Aug 13, 2019
Collaborator

@marekjg I can reproduce this issue from my side when MXNET_ENGINE_TYPE=NaiveEngine. But I try to use std::cout instead of LOG(FATAL), it will output error messages normally. So maybe there have inconsistent logging behavior between NaiveEngine and default engine for C++ API.

Let's see why LOG(FATAL) doesn't work in here :)

0 replies

marcoabreu · 2019-08-13T22:45:00Z

marcoabreu
Aug 13, 2019
Collaborator

@sandeep-krishnamurthy we had somebody who worked on exception for single- and multi-threaded execution, but I can't recall who. Can you assist here?

0 replies

wuxun-zhang · 2019-08-14T05:29:08Z

wuxun-zhang
Aug 14, 2019

FYI, when commenting out this line in dmlc/logging.h, the error stack trace will be printed normally for NaiveEngine. I don't know where to re-define this macro DMLC_LOG_BEFORE_THROW in MXNet, which is firstly defined in dmlc/base.h.

0 replies

anirudh2290 · 2019-08-15T01:23:08Z

anirudh2290
Aug 15, 2019
Collaborator

@marcoabreu I worked on exception handling for the backend. Having said that I am not very familiar with the CPP frontend language binding, and it looks like this issue is specific to CPP binding. The error code is not checked and exception is not being rethrown in the frontend language binding. cc @leleamol who worked on the CPP language binding.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU GEMM in float 16 fails silently with NaiveEngine (CPP api) #15862

{{title}}

Replies: 10 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

CPU GEMM in float 16 fails silently with NaiveEngine (CPP api) #15862

marekjg Aug 12, 2019

Description

Environment info (Required)

Build info (Required if built from source)

Error Message:

Minimum reproducible example and steps to reproduce

Steps to reproduce

Replies: 10 comments

mxnet-label-bot Aug 12, 2019

pengzhao-intel Aug 12, 2019 Collaborator

vdantu Aug 12, 2019

wuxun-zhang Aug 13, 2019

marekjg Aug 13, 2019 Author

pengzhao-intel Aug 13, 2019 Collaborator

pengzhao-intel Aug 13, 2019 Collaborator

marcoabreu Aug 13, 2019 Collaborator

wuxun-zhang Aug 14, 2019

anirudh2290 Aug 15, 2019 Collaborator

marekjg
Aug 12, 2019

mxnet-label-bot
Aug 12, 2019

pengzhao-intel
Aug 12, 2019
Collaborator

vdantu
Aug 12, 2019

wuxun-zhang
Aug 13, 2019

marekjg
Aug 13, 2019
Author

pengzhao-intel
Aug 13, 2019
Collaborator

pengzhao-intel
Aug 13, 2019
Collaborator

marcoabreu
Aug 13, 2019
Collaborator

wuxun-zhang
Aug 14, 2019

anirudh2290
Aug 15, 2019
Collaborator