Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable load_from_bytes of model and ExtScorer #3331

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

bernardohenz
Copy link
Contributor

This PR allows the loading of model and ExtScorer through array of bytes (string)

Some observations:

  1. For this implementation to work, I had to use D_GLIBCXX_USE_CXX11_ABI=1 when compiling the lib
  2. I've added a new argument (--init_from_bytes) in the client for testing the loading
  3. I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)
  4. I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

@community-tc-integration
Copy link

No Taskcluster jobs started for this pull request
The `allowPullRequests` configuration for this repository (in `.taskcluster.yml` on the
default branch) does not allow starting tasks for this pull request.

@lissyx
Copy link
Collaborator

lissyx commented Sep 22, 2020

For this implementation to work, I had to use D_GLIBCXX_USE_CXX11_ABI=1 when compiling the lib

this will be problematic for compatibility with some systems

I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)

Please introduce a new DS_CreateModelFromBuffer for example ?

I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

I really don't see how this feature can work well with the tensorflow infra without forcing allocating huge strings

MemmappedFileSystem::kMemmappedPackageDefaultGraphDef,
&graph_def_);
if (init_from_bytes){
// Need some help
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it makes any sense at all?

std::cerr << "Error at reading model file " << model_path << std::endl;
return DS_ERR_FAIL_INIT_MMAP;
if (init_from_bytes){
fbmodel_ = tflite::FlatBufferModel::BuildFromBuffer(model_string.c_str(), model_string.size_t);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that works, performance-wise?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't performed any benchmark, but I haven't noticed any slow down (for loading the pb model). When loading the Scorer, the test seems a bit slower due to reading/converting more than 900MB.
When running for our Language Model (size<20MB), it runs smoothly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if you use a 20MB scorer, yeah, it might not be too bad, but I guess it can also depend on use device you target. My fear is people using that without understanding the consequences and then complaining about memory usage.

@lissyx
Copy link
Collaborator

lissyx commented Sep 22, 2020

@bernardohenz I fail to get the usecase where this can be useful for mmap-able file formats (PBMM, TFLite), given how much this is going to force TensorFlow to allocate some memory.

@reuben
Copy link
Contributor

reuben commented Sep 22, 2020

@lissyx #3152 is one example

@lissyx
Copy link
Collaborator

lissyx commented Sep 22, 2020

@lissyx #3152 is one example

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

@lissyx
Copy link
Collaborator

lissyx commented Sep 22, 2020

I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with DS_CreateModel for instance)

Please introduce a new DS_CreateModelFromBuffer for example ?

Also, when you introduce that, please try and expose it in as much bindings as you can.

@reuben
Copy link
Contributor

reuben commented Sep 22, 2020

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.

@lissyx
Copy link
Collaborator

lissyx commented Sep 23, 2020

Does it? From looking at the implem, I'm unsure this can work without trashing memory of the device: https://github.com/tensorflow/tensorflow/blob/r2.3/tensorflow/lite/model_builder.cc#L104-L115

That's a good point, maybe we need to modify TensorFlow to take ownership of the pointer, or assume it'll outlive the session.

That's much more invasive and risky, except if we can upstream that, but we know how long it can take. Depending on @bernardohenz use-case, maybe it makes sense to just expose that for non protobuf mmap files ? or even only for TFLite models ?

@lissyx
Copy link
Collaborator

lissyx commented Sep 23, 2020

This PR allows the loading of model and ExtScorer through array of bytes (string)

Some observations:

1. For this implementation to work, I had to use `D_GLIBCXX_USE_CXX11_ABI=1` when compiling the lib

2. I've added a new argument (`--init_from_bytes`) in the client for testing the loading

3. I do not know how you guys wanna handle the naming of methods, and if you will want to keep backwards compatibility (with `DS_CreateModel` for instance)

4. I only tested loading .pb models. I will need some help for implementing the loading of memmapped and tflite models.

@bernardohenz I see you have code already for TFLite, can you describe what kind of help you need?

@bernardohenz
Copy link
Contributor Author

@lissyx I will be addressing the implementation of new methods for the API.
The points I should need some help is the implementation of loading from buffer of pdmm and tflite models. If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).
Although I had implemented the code for tflite (just because I've found the tflite::FlatBufferModel::BuildFromBuffer), I haven't tested it yet, I have never compiled with tflite and I will try that in the next week.

I have no answer for using D_GLIBCXX_USE_CXX11_ABI=1 when compiling. When set to 0, the libs ended up having segmentation fault. I still haven't figured out what D_GLIBCXX_USE_CXX11_ABI does differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.

@lissyx
Copy link
Collaborator

lissyx commented Sep 24, 2020

If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).

I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations

When set to 0, the libs ended up having segmentation fault. I still haven't figured out what D_GLIBCXX_USE_CXX11_ABI does differently on what I've implemented. If you have any thoughts about this, it would be much appreciated.

This does impact how C++ code handles strings, so maybe it depends on your env and in your case it makes sense? If you can share more of a stack, for example.

@lissyx
Copy link
Collaborator

lissyx commented Sep 24, 2020

@bernardohenz can you:

  • check tflite code path on your side
  • force-push without D_GLIBCXX_USE_CXX11_ABI

I could take a look and run CI on that, so we can cross-check

@lissyx
Copy link
Collaborator

lissyx commented Sep 24, 2020

If I understood correctly, they would be much slower than loading from path, but still it's good to have this option than not having any option at all (we could put a message warning that this loading is slower for the sake of clarification).

I want to make sure we don't put ourselves in a position where people will misuse and have unrealistic expectations

Maybe this should be warned about in the docs: "using this feature can have negative impact on the amount of memory consumed by your application"

@lissyx
Copy link
Collaborator

lissyx commented Sep 28, 2020

@bernardohenz Your PR needs rebasing, it seems.

@lissyx
Copy link
Collaborator

lissyx commented Sep 28, 2020

@bernardohenz Please don't change file's mode from 644 to 755.

@@ -69,45 +71,60 @@ void Scorer::setup_char_map()
}
}

int Scorer::load_lm(const std::string& lm_path)
int Scorer::load_lm(const std::string& lm_string, bool load_from_bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either provide a default value, split the implementation in two load_lm or update all call sites: this PR does not build because of load_lm calls in generate_scorer_package.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've provided a default value (false) for the boolean

}

int Scorer::load_trie(std::ifstream& fin, const std::string& file_path)
int Scorer::load_trie(std::stringstream& fin, const std::string& file_path, bool load_from_bytes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringstream is compatible with ifstream ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I tested yes

}

int
DS_CreateModel_(const std::string &aModelString,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this is implemented, we go straight from const char* to std::string &

native_client/deepspeech.h Outdated Show resolved Hide resolved
* @return Zero on success, non-zero on failure.
*/
DEEPSPEECH_EXPORT
int DS_CreateModelFromBuffer(const std::string &aModelBuffer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string here makes that C++ limited, the API is C

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be changing all the strings to char *. But I'll have to pass the buffer size as parameter (I was having problems when a buffer encountered the \0 in the middle of a buffer)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats perfectly fine.

* @return Zero on success, non-zero on failure (invalid arguments).
*/
DEEPSPEECH_EXPORT
int DS_EnableExternalScorerFromBuffer(ModelState* aCtx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, this is limiting to C++ callers when we expose C API.

native_client/deepspeech.h Outdated Show resolved Hide resolved
@@ -17,9 +17,13 @@ DontBhiksha::DontBhiksha(const void * /*base*/, uint64_t /*max_offset*/, uint64_
const uint8_t kArrayBhikshaVersion = 0;

// TODO: put this in binary file header instead when I change the binary file format again.
void ArrayBhiksha::UpdateConfigFromBinary(const BinaryFormat &file, uint64_t offset, Config &config) {
void ArrayBhiksha::UpdateConfigFromBinary(const BinaryFormat &file, uint64_t offset, Config &config, bool load_from_bytes) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be upstreamed ?

@lissyx
Copy link
Collaborator

lissyx commented Sep 28, 2020

@bernardohenz You should now be able to run PRs yourself.

@lissyx lissyx closed this Sep 28, 2020
@lissyx lissyx reopened this Sep 28, 2020
@bernardohenz
Copy link
Contributor Author

I've fixed the API functions exposition, tested the loading for TFLite.

I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):

Loading a dummy .pb

Inputs Time for running ./deepspeech
Avg from 20 calls
model output_graph.pb 0.54
model output_graph.pb --init_from_bytes 0.55
model output_graph.pb --scorer deepspeech-0.8.1-models.scorer 1.45
model output_graph.pb --scorer deepspeech-0.8.1-models.scorer --init_from_bytes 4.56

Loading a .tflite

Inputs Time for running ./deepspeech
Avg from 20 calls
model deepspeech-0.8.2-models.tflite 1.72
model deepspeech-0.8.2-models.tflite --init_from_bytes 1.85
model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer 2.50
model deepspeech-0.8.2-models.tflite --scorer deepspeech-0.8.1-models.scorer --init_from_bytes 5.66

@lissyx
Copy link
Collaborator

lissyx commented Sep 29, 2020

@bernardohenz There are still failures related to load_lm() call sites on https://community-tc.services.mozilla.com/tasks/groups/VhPPLNsKSzeyZDg5kxjxqQ

@lissyx
Copy link
Collaborator

lissyx commented Sep 29, 2020

I've performed a small time benchmark for running the client (both for loading a .b and a .tflite):

Just for the sake of completeness, can you add figures for before your changes?

I've fixed the API functions exposition, tested the loading for TFLite.

Good, so now it's just about:

  • exposing through all our bindings,
  • gettings builds green,
  • and adding some test coverage,

Right?

@lissyx
Copy link
Collaborator

lissyx commented Sep 30, 2020

@bernardohenz I see some merges, can you make sure this branch is clean of merges ?

@bernardohenz
Copy link
Contributor Author

@lissyx yep, I'm trying to squash the commits, sorry.

@reuben
Copy link
Contributor

reuben commented Oct 5, 2020

Just writing down some notes from my talk with Bernardo last week on the status of this PR:

  • We fixed out a segfault due to passing an address of a temporary
  • Bernardo is going to do add some timing calls in the scorer loading code to figure out if the slowdown is coming from the extra copy on the client.cc side (in which case it doesn't matter) or if the slowdown is inside libdeepspeech, in which case it needs to be addressed.
  • Then we need to make sure all the bindings get covered and there's test coverage as mentioned by @lissyx above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants