Enable load_from_bytes of model and ExtScorer #3331

bernardohenz · 2020-09-22T12:28:14Z

This PR allows the loading of model and ExtScorer through array of bytes (string)

Some observations:

For this implementation to work, I had to use D_GLIBCXX_USE_CXX11_ABI=1 when compiling the lib
I've added a new argument (--init_from_bytes) in the client for testing the loading
I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)
I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

community-tc-integration · 2020-09-22T12:28:30Z

No Taskcluster jobs started for this pull request

The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.

lissyx · 2020-09-22T14:00:03Z

For this implementation to work, I had to use D_GLIBCXX_USE_CXX11_ABI=1 when compiling the lib

this will be problematic for compatibility with some systems

I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)

Please introduce a new DS_CreateModelFromBuffer for example ?

I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

I really don't see how this feature can work well with the tensorflow infra without forcing allocating huge strings

lissyx · 2020-09-22T14:02:34Z

native_client/tfmodelstate.cc

-                             MemmappedFileSystem::kMemmappedPackageDefaultGraphDef,
-                             &graph_def_);
+  if (init_from_bytes){
+    // Need some help


Does it makes any sense at all?

lissyx · 2020-09-22T14:03:07Z

native_client/tflitemodelstate.cc

-    std::cerr << "Error at reading model file " << model_path << std::endl;
-    return DS_ERR_FAIL_INIT_MMAP;
+  if (init_from_bytes){
+    fbmodel_ = tflite::FlatBufferModel::BuildFromBuffer(model_string.c_str(), model_string.size_t);


How does that works, performance-wise?

When I see this, much alloc: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

I haven't performed any benchmark, but I haven't noticed any slow down (for loading the pb model). When loading the Scorer, the test seems a bit slower due to reading/converting more than 900MB.
When running for our Language Model (size<20MB), it runs smoothly

Ok, if you use a 20MB scorer, yeah, it might not be too bad, but I guess it can also depend on use device you target. My fear is people using that without understanding the consequences and then complaining about memory usage.

lissyx · 2020-09-22T14:08:27Z

@bernardohenz I fail to get the usecase where this can be useful for mmap-able file formats (PBMM, TFLite), given how much this is going to force TensorFlow to allocate some memory.

reuben · 2020-09-22T14:10:57Z

@lissyx #3152 is one example

lissyx · 2020-09-22T14:12:21Z

@lissyx #3152 is one example

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

lissyx · 2020-09-22T14:29:07Z

I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)

Please introduce a new DS_CreateModelFromBuffer for example ?

Also, when you introduce that, please try and expose it in as much bindings as you can.

reuben · 2020-09-22T14:37:03Z

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.

lissyx · 2020-09-23T04:18:57Z

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.

That's much more invasive and risky, except if we can upstream that, but we know how long it can take. Depending on @bernardohenz use-case, maybe it makes sense to just expose that for non protobuf mmap files ? or even only for TFLite models ?

lissyx · 2020-09-23T04:22:28Z

This PR allows the loading of model and ExtScorer through array of bytes (string)

Some observations:

1. For this implementation to work, I had to use `D_GLIBCXX_USE_CXX11_ABI=1` when compiling the lib

2. I've added a new argument (`--init_from_bytes`) in the client for testing the loading

3. I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with `DS_CreateModel` for instance)

4. I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

@bernardohenz I see you have code already for TFLite, can you describe what kind of help you need?

bernardohenz · 2020-09-24T12:07:52Z

@lissyx I will be addressing the implementation of new methods for the API.
The points I should need some help is the implementation of loading from buffer of pdmm and tflite models. If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).
Although I had implemented the code for tflite (just because I've found the tflite::FlatBufferModel::BuildFromBuffer), I haven't tested it yet, I have never compiled with tflite and I will try that in the next week.

I have no answer for using D_GLIBCXX_USE_CXX11_ABI=1 when compiling. When set to 0, the libs ended up having segmentation fault. I still haven't figured out what D_GLIBCXX_USE_CXX11_ABI does differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.

lissyx · 2020-09-24T12:13:35Z

If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).

I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations

When set to 0, the libs ended up having segmentation fault. I still haven't figured out what D_GLIBCXX_USE_CXX11_ABI does differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.

This does impact how C++ code handles strings, so maybe it depends on your env and in your case it makes sense? If you can share more of a stack, for example.

lissyx · 2020-09-24T12:17:29Z

@bernardohenz can you:

check tflite code path on your side
force-push without D_GLIBCXX_USE_CXX11_ABI

I could take a look and run CI on that, so we can cross-check

lissyx · 2020-09-24T13:30:52Z

If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).

I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations

Maybe this should be warned about in the docs: "using this feature can have negative impact on the amount of memory consumed by your application"

lissyx · 2020-09-28T08:21:00Z

@bernardohenz Your PR needs rebasing, it seems.

lissyx · 2020-09-28T10:55:06Z

@bernardohenz Please don't change file's mode from 644 to 755.

lissyx · 2020-09-28T10:58:05Z

native_client/ctcdecode/scorer.cpp

@@ -69,45 +71,60 @@ void Scorer::setup_char_map()
  }
 }

-int Scorer::load_lm(const std::string& lm_path)
+int Scorer::load_lm(const std::string& lm_string, bool load_from_bytes)


Either provide a default value, split the implementation in two load_lm or update all call sites: this PR does not build because of load_lm calls in generate_scorer_package.cpp

I've provided a default value (false) for the boolean

lissyx · 2020-09-28T10:59:25Z

native_client/ctcdecode/scorer.cpp

 }

-int Scorer::load_trie(std::ifstream& fin, const std::string& file_path)
+int Scorer::load_trie(std::stringstream& fin, const std::string& file_path, bool load_from_bytes)


stringstream is compatible with ifstream ?

As far as I tested yes

lissyx · 2020-09-28T11:01:21Z

native_client/deepspeech.cc

+}
+
+int
+DS_CreateModel_(const std::string &aModelString,


Not sure how this is implemented, we go straight from const char* to std::string &

native_client/deepspeech.h

lissyx · 2020-09-28T11:02:08Z

native_client/deepspeech.h

+ * @return Zero on success, non-zero on failure.
+ */
+DEEPSPEECH_EXPORT
+int DS_CreateModelFromBuffer(const std::string &aModelBuffer,


std::string here makes that C++ limited, the API is C

I'll be changing all the strings to char *. But I'll have to pass the buffer size as parameter (I was having problems when a buffer encountered the \0 in the middle of a buffer)

Thats perfectly fine.

lissyx · 2020-09-28T11:02:32Z

native_client/deepspeech.h

+ * @return Zero on success, non-zero on failure (invalid arguments).
+ */
+DEEPSPEECH_EXPORT
+int DS_EnableExternalScorerFromBuffer(ModelState* aCtx,


Same, this is limiting to C++ callers when we expose C API.

native_client/deepspeech.h

lissyx · 2020-09-28T11:03:09Z

native_client/kenlm/lm/bhiksha.cc

@@ -17,9 +17,13 @@ DontBhiksha::DontBhiksha(const void * /*base*/, uint64_t /*max_offset*/, uint64_
 const uint8_t kArrayBhikshaVersion = 0;

 // TODO: put this in binary file header instead when I change the binary file format again.
-void ArrayBhiksha::UpdateConfigFromBinary(const BinaryFormat &file, uint64_t offset, Config &config) {
+void ArrayBhiksha::UpdateConfigFromBinary(const BinaryFormat &file, uint64_t offset, Config &config, bool load_from_bytes) {


Shouldn't this be upstreamed ?

lissyx · 2020-09-28T15:05:57Z

@bernardohenz You should now be able to run PRs yourself.

bernardohenz · 2020-09-28T16:05:04Z

I've fixed the API functions exposition, tested the loading for TFLite.

I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):

Loading a dummy .pb

Inputs	Time for running ./deepspeech Avg from 20 calls
model output_graph.pb	0.54
model output_graph.pb --init_from_bytes	0.55
model output_graph.pb --scorer deepspeech-0.8.1-models.scorer	1.45
model output_graph.pb --scorer deepspeech-0.8.1-models.scorer --init_from_bytes	4.56

Loading a .tflite

Inputs	Time for running ./deepspeech Avg from 20 calls
model deepspeech-0.8.2-models.tflite	1.72
model deepspeech-0.8.2-models.tflite --init_from_bytes	1.85
model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer	2.50
model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer --init_from_bytes	5.66

lissyx · 2020-09-29T08:41:36Z

@bernardohenz There are still failures related to load_lm() call sites on https://community-tc.services.mozilla.com/tasks/groups/VhPPLNsKSzeyZDg5kxjxqQ

lissyx · 2020-09-29T12:31:56Z

I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):

Just for the sake of completeness, can you add figures for before your changes?

I've fixed the API functions exposition, tested the loading for TFLite.

Good, so now it's just about:

exposing through all our bindings,
gettings builds green,
and adding some test coverage,

Right?

lissyx · 2020-09-30T15:48:44Z

@bernardohenz I see some merges, can you make sure this branch is clean of merges ?

bernardohenz · 2020-09-30T15:52:26Z

@lissyx yep, I'm trying to squash the commits, sorry.

reuben · 2020-10-05T10:50:41Z

Just writing down some notes from my talk with Bernardo last week on the status of this PR:

We fixed out a segfault due to passing an address of a temporary
Bernardo is going to do add some timing calls in the scorer loading code to figure out if the slowdown is coming from the extra copy on the client.cc side (in which case it doesn't matter) or if the slowdown is inside libdeepspeech, in which case it needs to be addressed.
Then we need to make sure all the bindings get covered and there's test coverage as mentioned by @lissyx above

…FromBuffer

lissyx reviewed Sep 22, 2020

View reviewed changes

lissyx reviewed Sep 28, 2020

View reviewed changes

native_client/deepspeech.h Outdated Show resolved Hide resolved

lissyx reviewed Sep 28, 2020

View reviewed changes

native_client/deepspeech.h Outdated Show resolved Hide resolved

lissyx reviewed Sep 28, 2020

View reviewed changes

bernardohenz force-pushed the loading_from_bytes branch from 4726e4a to 8113f98 Compare September 28, 2020 14:29

lissyx closed this Sep 28, 2020

lissyx reopened this Sep 28, 2020

bernardohenz force-pushed the loading_from_bytes branch from e42c7cb to 5a5ef46 Compare September 30, 2020 16:00

danielefernandes and others added 9 commits October 20, 2020 16:27

Loading model from both path or array of bytes

06c23fe

Loading ExtScorer from array of bytes

dc8553b

removing debug info

d6a4e37

Exposing methods DS_CreateModelFromBuffer and DS_EnableExternalScorer…

9671041

…FromBuffer

Loading tflite buffer from buffer working

dd73ec8

Fixing API functions exposition

9faeb51

load_trie with default value for load_from_bytes

71e19bc

Seting a default value for load_from_bytes in load_lm

3b9d2b0

-Change string to char* in the API

df40e22

bernardohenz force-pushed the loading_from_bytes branch from 5a5ef46 to df40e22 Compare October 20, 2020 19:28

lissyx mentioned this pull request Jan 21, 2021

File descriptor support for Android/Java/JNI #3507

Open

Enable load_from_bytes of model and ExtScorer #3331

Are you sure you want to change the base?

Enable load_from_bytes of model and ExtScorer #3331

Conversation

bernardohenz commented Sep 22, 2020

community-tc-integration bot commented Sep 22, 2020

lissyx commented Sep 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lissyx commented Sep 22, 2020 • edited Loading

reuben commented Sep 22, 2020

lissyx commented Sep 22, 2020

lissyx commented Sep 22, 2020

reuben commented Sep 22, 2020

lissyx commented Sep 23, 2020

lissyx commented Sep 23, 2020

bernardohenz commented Sep 24, 2020

lissyx commented Sep 24, 2020

lissyx commented Sep 24, 2020

lissyx commented Sep 24, 2020

lissyx commented Sep 28, 2020

lissyx commented Sep 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lissyx commented Sep 28, 2020

bernardohenz commented Sep 28, 2020

lissyx commented Sep 29, 2020

lissyx commented Sep 29, 2020

lissyx commented Sep 30, 2020

bernardohenz commented Sep 30, 2020

reuben commented Oct 5, 2020

lissyx commented Sep 22, 2020 •

edited

Loading

lissyx commented Sep 22, 2020 •

edited

Loading