Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential IndexWriter performance in concurrent environments. #110

Open
bongohrtech opened this issue Jan 13, 2020 · 4 comments
Open

Sequential IndexWriter performance in concurrent environments. #110

bongohrtech opened this issue Jan 13, 2020 · 4 comments

Comments

@bongohrtech
Copy link
Owner

When creating Lucene.Net indices in parallel, sequential-like performance is experienced. Profiling 8 concurrent IndexWriter instances writing in parallel shows that WeakIdentityMap::IdentityWeakReference::Equals spends most time garbage collecting (94.91%) and TokenStream::AssertFinal (87.09% garbage collecting) in my preliminary tests (see screenshots).

The WeakIdentityMap implementation uses an IdentityWeakReference as key, which is implemented as a class. By inspection of this class, it is merely a System.Runtime.InteropServices.GCHandle wrapper as can be seen in the mono project, manually wrapping of this struct in a struct rather than a class - will eliminate some of the immense amounts of garbage collection.

JIRA link - [LUCENENET-640] created by matt2843
@bongohrtech
Copy link
Owner Author

bongohrtech commented Jan 14, 2020

Hi Mathais,

Thanks for the report and the PR.

Correct me if I am wrong, but wouldn't a better fix for this to be to replace WeakIdentityMap with the thread-safe ConditionalWeakTable? We may still need to utilize IdentityWeakReference, but it would need to be a class since ConditionalWeakTable has a class constraint on TKey.

Do note that Microsoft didn't expose the enumerator or the AddOrUpdate method of ConditionalWeakTable until .NET Standard 2.1. However, Lucene requires one or the other in every (other) place where ConditionalWeakTable would be useful (specifically, as a replacement for Lucene.Net.Support.WeakDictionary). An effort to port ConditionalWeakTable from .NET Standard 2.1 back to .NET Standard 2.0 (GH-636) is currently underway, but stalled on this J2N branch. Unfortunately, it depends on unmanaged resources that somehow need to be re-mapped or embedded in order to make it functional. Perhaps there is also a way to cut through at a lower level and make a ConditionalWeakIdentityTable that could be used as a direct replacement for WeakIdentityMap instead of using IdentityWeakReference.

If you could take a look at using ConditionalWeakTable to solve this issue, it would be much appreciated. Since there appears to be one place where the enumerator is required here, the best approach would be to first check for compatibility on .NET Standard 2.1 and if that works, help us to complete the port of ConditionalWeakTable for .NET Framework 4.5 and .NET Standard 2.0 by submitting a PR to the J2N project so the same fix can also be applied to those platforms.

by nightowl888

@bongohrtech
Copy link
Owner Author

Hi Shad Storhaug,

I think Ayende Rahien faced same issue, 

take a look here

by eladmarg

@bongohrtech
Copy link
Owner Author

bongohrtech commented Feb 7, 2020

Elad,

Well, this particular issue has been addressed by swapping ConditionalWeakTable for WeakIdentityMap. However, we are still using WeakDictionary in several places, most notably in the FieldCacheImpl (see GH-610). This is due to the lack of enumerator support before .NET Standard 2.1 primarily.

That said, the article that describes the problem specifically mentions it is a problem with Lucene.NET. Thanks for the link. Hopefully, he is planning to submit the fix back to us after he thoroughly tests it.

by nightowl888

@bongohrtech
Copy link
Owner Author

We are using an old version of Lucene at the moment, so I don't think that a PR from my current codebase would be very helpful.

The PRs for the changes in Lucene 3.0.3 are here:

ravendb#10

ravendb#11

 

Feel free to make use of them as you see fit.

This is currently under testing, we saw GC time reduced from GEN 0 ~0.9 seconds to 0.125 seconds. GEN 2 from 0.9 to 0.5.

by ayende

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant