Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigating GSon #8668

Closed
Arthur-Milchior opened this issue Apr 23, 2021 · 31 comments
Closed

Investigating GSon #8668

Arthur-Milchior opened this issue Apr 23, 2021 · 31 comments

Comments

@Arthur-Milchior
Copy link
Member

As far as I remember, some of the long tasks, such as loading the collection when the application start, takes a few second - on big collection - processing JSon. If I remember correctly, one of the bottleneck was that each JSon get operation uses a try/catch, which is is an expansive operation. The only trouble is that the JSONException are checked exception and that we don't want to throws them everywhere in the codebase.

I think it might be interesting to check GSon, another library for JSon. I suspect we would win time some precious seconds. Indeed:

  • Its exceptions are not checked
  • it can convert between JSONObjects/array and java Map/List. This would avoid the entire overhead of the JSON library

Most of the change can be automatized, but that's not entirely trivial. Indeed, the functions do not have the same name even if it has the same signature than in current implementation. And if we want to save this second of loading time, it's not ideal to create an intermediate class just to change the function name.

It may be useless if most of the work moves from java to rust, those try/catch will disappear with it anyway

@lazy-codebear
Copy link

Can I work on this issue? Is your meaning is using GSon library to replace now library.

@Arthur-Milchior
Copy link
Member Author

You can work on it.

I don't know whether gson will replace current JSON library. What I want is to make the test and see whether it will improve loading time at all, and potentially other long operations. Even creating a note, when the note type is complex, or changing a note type, can be improved when the JSON library is efficient, as accessing template is required often

You can test over http://milchior.fr/anki/ which is my real life collection and take a dozen of seconds to load. That seemed silly (at least to me), but each second saved is actually really great for the confort of use of the app.

@lazy-codebear
Copy link

I will try to do it. It is the first time I working on anki. How does it work when import a deck?

@Arthur-Milchior
Copy link
Member Author

It's not a deck but an entire collection. You'll need to use the desktop version to import it. And you should create a new profile, as otherwise it would remove the current content of your collection.
But anyway, you can probably do tests first on your usual collection or on a test collection, and once you are happy with the code change you made, test on a big collection

@TarekkMA
Copy link
Contributor

TarekkMA commented May 8, 2021

Jackson is supposed to be much faster than gson.

@mikehardy
Copy link
Member

Script a set of tests that exercise both cases on the same set of easily-available test data, then post benchmarks that others can attempt to reproduce, or it's just talk ;-)

@TarekkMA
Copy link
Contributor

TarekkMA commented May 9, 2021

Script a set of tests that exercise both cases on the same set of easily-available test data, then post benchmarks that others can attempt to reproduce, or it's just talk ;-)

Challenge Accepted 😃 , should this benchmark be done in android context or in the JVM environment is enoght?

@mikehardy
Copy link
Member

Users only care about the performance they see in the real world on their real device, right? It has to be a practical test, like of the Anking decks or I think Arthur has his decks up for testing on https://milchior.fr but I'm not sure where?

@mikehardy
Copy link
Member

Sorry - meant to actually tag @Arthur-Milchior - Arthur - do you have your decks available somewhere for people looking to stress test Anki things? I seem to recall you did, if not sorry for the bother

@TarekkMA
Copy link
Contributor

TarekkMA commented May 9, 2021

I think this is the link: http://milchior.fr/anki/.

I found this repo with some benchmarks. https://github.com/fabienrenaud/java-json-benchmark, and Jackson is among the top. Jackson provides databinding which will be super helpful to covert models and configs from json to java classes easily.

@TarekkMA
Copy link
Contributor

TarekkMA commented May 10, 2021

Benchmark Results

device: Huawei P30

Using Arthur's collection (1.3 GB) and this repo https://github.com/TarekkMA/android_json_benchmarks

decks json (500 KB)

benchmark:    10,774,479 ns DecksBenchmark.jackson
benchmark:    24,029,167 ns DecksBenchmark.old
benchmark:    11,436,459 ns DecksBenchmark.gson
benchmark:    18,019,271 ns DecksBenchmark.moshi
benchmark:     7,314,061 ns DecksBenchmark.jackson_streaming
benchmark:    10,473,437 ns DecksBenchmark.jackson_jr

models json (34.1 MB 🤯 )

benchmark:   143,337,500 ns ModelsBenchmark.jackson
benchmark:   534,700,000 ns ModelsBenchmark.old
benchmark:   121,797,396 ns ModelsBenchmark.gson
benchmark:   277,864,062 ns ModelsBenchmark.moshi
benchmark:   140,212,501 ns ModelsBenchmark.jackson_streaming
benchmark:   140,522,396 ns ModelsBenchmark.jackson_jr

Gson

I found this issue about gson [Question] : is gson still maintained? #1821

jackson_jr

Supposed to be a more lightweight version of Jackson data binding, but it does lack some features.

using default config I couldn't parse Map<Long, OBJ> from a json unlike other parsers so I stelled on Map<String, OBJ>. another issue is I couldn't parse primitive array.

moshi

slowest of them all

jackson

have lots of features I only used data binding because it's the easiest. also they have set of annotations that can change how serialization works.

jackson streaming

Slightly faster than data binding but not very convenient to write. I think the memory usage may be better in streaming.

@mikehardy
Copy link
Member

Nice!

GSON definitely out as unmaintained - good find on that issue. Ingesting unmaintained repos is always a risk, and always best avoided as you see doing real benchmarks takes time and are needed for performance areas

jackson_jr already has identified problems, it's out

moshi seems slow and jackson apppears to support other things that have interest (data binding), so between those I'd go with jackson

What's the APK size difference? Relatively easy to examine by turning minify on for debug build and comparing output...it does not appear to contain native libraries so it should not be too bad but there is nothing like knowing instead of guessing :-)

@TarekkMA
Copy link
Contributor

gson was already added as a dependency, so I replaced it with Jackson and tried to my best to use jackson the same as gson so that the minifer would only remove unused stuff.

Gson (streaming) was used here and here
Gson (data binding) was used here

Gson (previous version)

Screen Shot 2021-05-10 at 6 59 18 PM

Jackson

Screen Shot 2021-05-10 at 6 59 05 PM

Additional notes

from my research jackson seems to be very large library in term of method count and in the number of modules/extensions that can be added to it

@TarekkMA
Copy link
Contributor

I have used jackson streaming and added the result of the benchmark to my previous comment.

jackson databinding includes streaming (core) and annotation package. if we want to be save size we can do everything using streaming rather than data binding. it will be faster than databinding, but very verbose and tedious.

@mikehardy
Copy link
Member

That's quite the performance increase... It looks worth it to me

@TarekkMA
Copy link
Contributor

TarekkMA commented May 10, 2021

If we go with streaming only the size would be exactly the same as the previous version with gson, but will have to write everything by hand and no more one-liners

Streaming API example form my benchmark: (The same would go for writing the json back using JsonGenerator)

https://github.com/TarekkMA/android_json_benchmarks/blob/c090f901aa369e5839bf38dc85744fcd371d587b/benchmark/src/androidTest/java/com/example/benchmark/json_parse/models/N_Deck.java#L29-L99

Should we switch to streaming only?

@TarekkMA
Copy link
Contributor

TarekkMA commented May 10, 2021

Another candidate is https://github.com/ngs-doo/dsl-json , will try to get some benchmarks on it

Edit: mmm dsl-json is about 250 ms which is comparable to moshi, maybe I'm doing something wrong. but anyways the library not very active on GitHub.

@TarekkMA
Copy link
Contributor

I guess jackson is the way to go, but should we go streaming or databinding 🤔

@mikehardy
Copy link
Member

less code I think - not having the JSON schema embedded in our parsing code like that seems like a win

@Arthur-Milchior
Copy link
Member Author

I assume "old" meant what we use right now, right?
If so, clearly, I'll love to have your update!

Thanks for finding my deck so quickly. Please don't hesitate to ping me on discord if needed. That would have been realy quick to answer, and would be a shame to block you. It's hard to know what to answer first on github.

@Arthur-Milchior
Copy link
Member Author

Interesting. I didn't even knew we used gson. This issue was referring to reading Decks/Models from the collection. But clearly, this is interesting too.

Note that

                default:
                    jp.skipChildren();

would not be acceptable. If there are properties we don't know, we should keep them. Indeed, when we recreate the string and put it in the database, we want to put all properties we didn't change as they used to be

@TarekkMA
Copy link
Contributor

TarekkMA commented May 11, 2021

Interesting. I didn't even knew we used gson. This issue was referring to reading Decks/Models from the collection. But clearly, this is interesting too.

Note that

                default:
                    jp.skipChildren();

would not be acceptable. If there are properties we don't know, we should keep them. Indeed, when we recreate the string and put it in the database, we want to put all properties we didn't change as they used to be

You are right, that is just for the benchmarking code since the streaming API is very verbose I only handled some elements not all.

we will be using the databinding API which will have lots of cool tricks and things we could use like the following snippint of code:

The following code will serialize and deserialize unknow fields into/from extraProperties map

public class Model {

    private Map<String, Object> extraProperties = new HashMap<>();

    @JsonAnyGetter
    public Map<String, Object> getExtraProperties() {
        return extraProperties;
    }

    @JsonAnySetter
    public void setExtraProperty(String name, Object value) {
        extraProperties.put(name, value);
    }
}

@Arthur-Milchior
Copy link
Member Author

Great.

I'd clearly be curious, if you have the courage to go into git history, can you find and explain why we used two different libraries?
I can't figure out whether the authors explained why they had distinct needs.

Great to see that the extra properties where taken into account. Thanks.

@TarekkMA
Copy link
Contributor

Gson is used from 2013, I don't know the exact reason why but maybe the authors needed to do json streaming with gson as seen in

JsonReader jr = new JsonReader(new FileReader(mediaMapFile));
jr.beginObject();
String name;
String num;
while (jr.hasNext()) {
num = jr.nextName();
name = jr.nextString();
nameToNum.put(name, num);
numToName.put(num, name);
}
jr.endObject();
jr.close();

If I found an interesting git message I will be sure to put it here

@Arthur-Milchior
Copy link
Member Author

Replying to #8799 (comment)

@Arthur-Milchior should I convert org.json.JSON*** to use jackson?

I will try to convert one usage after the other each in a PR.

If it's faster, then please go for it. But if you can replace our own JSON library, that would be even better. It's only role is to catch the checked exception and replace them by unchecked exception. It simplifies a lot the remaining of the code, but there is a small cost in efficiency.

@TarekkMA TarekkMA mentioned this issue May 12, 2021
7 tasks
@TarekkMA
Copy link
Contributor

@Arthur-Milchior jackson has a tree model that can replace current json classes. but I think the goal is the serilize/deserilize json from/to a type safe and expressive class instead of using string keys everywhere.

#8810 I tried migrating Model as a demo for what will happen if we used databinding. along with tests that will hopefully clear how it's being used.

@TarekkMA
Copy link
Contributor

@Arthur-Milchior

#8817 I have just finished converting the old JSONObject/JSONArray to utilize Jackson tree model API.

And the results are very very amazing,

Benchmarks for parsing 34MB of models

Current Appreach with org.json.**** (535.8 ms)

            mModels = new HashMap<>();
            JSONObject modelarray = new JSONObject(modelsJson);
            JSONArray ids = modelarray.names();
            if (ids != null) {
                for (String id : ids.stringIterable()) {
                    Model o = new Model(modelarray.getJSONObject(id));
                    mModels.put(o.getLong("id"), o);
                }
            }

Current Approach with the new JSON objects (249 ms)

            mModels = new HashMap<>();
            TMJSONObject modelarray = new TMJSONObject(modelsJson);
            TMJSONArray ids = modelarray.names();
            if (ids != null) {
                for (String id : ids.stringIterable()) {
                    TMModel o = new TMModel(modelarray.getJSONObject(id));
                    mModels.put(o.getLong("id"), o);
                }
            }

Letting Jackson deserialize directly to the map (240 ms)

            mModels = objectMapper.readValue(modelsJson, new TypeReference<Map<Long, TMModel>>() {});

@Arthur-Milchior
Copy link
Member Author

To be honest, I'm fine with the 9ms cost of keeping current approach compared to using directly a map :p

Yeah, you're right, that's amazing.

I'm kind of sorry for you, because there used to me changes that can save dozens of seconds, but most are gone, and now you have to deal with changes that same 300ms. I really like ensuring that the application is as fluid as possible (and I suspect it will also uses less memory for temporary variables), and it makes the app more pleasurable. But you missed last year progress, that was more spectacular: https://www.reddit.com/r/Anki/comments/iu24ph/ankidroid_one_year_of_change_in_term_of_speed/

@TarekkMA
Copy link
Contributor

Wow the difference you made is incridible, I hope I will be able to make the app more pleasurable for the end-user to use and for the developer to contribute and understand/test the code.

@mikehardy
Copy link
Member

I think you already have!

@github-actions
Copy link
Contributor

Hello 👋, this issue has been opened for more than 2 months with no activity on it. If the issue is still here, please keep in mind that we need community support and help to fix it! Just comment something like still searching for solutions and if you found one, please open a pull request! You have 7 days until this gets closed automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants