-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The PartialSort operator can be improved #840
Comments
@theodorzoulias - Nice analysis. Two things: 1. In the future, when doing benchmarks, recommend that you use BenchmarkDotNet; 2. If you want to write a PR for this on my fork SuperLinq, I'll gladly take it in. If not, I'll rely on this analysis to update the code myself shortly. |
@viceroypenguin thanks for the quick response. I know about the BenchmarkDotNet, and I've used it, once. I realized that I don't have the patience to wait for half an hour for a simple benchmark, nor I like the idea of my PC getting tortured while I am rolling my thumbs, so I gave up with it. I am not aiming to precise measurements anyway. Getting a rough idea of how something performs is usually enough for me. Which obviously makes me not the right person for writing PRs for this high-profile project. 😃 |
No worries! I'll write the changes myself then. Thanks again for the analysis! :) |
@theodorzoulias - For the record, |
After testing with BDN, it looks like the original version in MoreLINQ, and the modified in SuperLinq (to have a stable sort) are generally the fastest versions. Please see here for the benchmark results and the code used. Once N gets up to about 1000, then the LINQ version does get faster. |
@viceroypenguin ohh, the public static IEnumerable<T> PartialSort<T>(
this IEnumerable<T> source,
int count,
IComparer<T> comparer = default,
OrderByDirection direction = OrderByDirection.Ascending)
{
ArgumentNullException.ThrowIfNull(source);
if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
comparer ??= Comparer<T>.Default;
if (direction == OrderByDirection.Descending)
{
var ascendingComparer = comparer;
comparer = Comparer<T>.Create((x, y) => ascendingComparer.Compare(y, x));
}
SortedSet<(T Item, long Index)> bottom = new(Comparer<(T Item, long Index)>.Create((x, y) =>
{
int result = comparer.Compare(x.Item, y.Item);
if (result == 0) result = Comparer<long>.Default.Compare(x.Index, y.Index);
return result;
}));
long index = 0;
foreach (var item in source)
{
index++;
if (bottom.Count < count) { bottom.Add((item, index)); continue; }
if (comparer.Compare(item, bottom.Max.Item) > 0) continue;
bool removed = bottom.Remove(bottom.Max);
bool added = bottom.Add((item, index));
Debug.Assert(removed && added);
}
foreach (var entry in bottom) yield return entry.Item;
} |
@viceroypenguin in order to magnify the effect of doing more comparisons than necessary, you could use long |
Good call checking with a longer key comparison. Please check the gist now with updated code and results. You might specifically review the changes I made to |
Fixed in Superlinq (viceroypenguin/SuperLinq@32f0241) |
@viceroypenguin Nice! Btw I don't know if you are allowed to use .NET 6 components in the library. If you do, the public static IEnumerable<T> PartialSort<T>(
this IEnumerable<T> source,
int count,
IComparer<T> comparer = default,
OrderByDirection direction = OrderByDirection.Ascending)
{
ArgumentNullException.ThrowIfNull(source);
if (count < 1) throw new ArgumentOutOfRangeException(nameof(count));
comparer ??= Comparer<T>.Default;
if (direction == OrderByDirection.Ascending)
{
var originalComparer = comparer;
comparer = Comparer<T>.Create((x, y) => originalComparer.Compare(y, x));
}
PriorityQueue<bool, (T Item, long Index)> top = new(Comparer<(T Item, long Index)>.Create((x, y) =>
{
int result = comparer.Compare(x.Item, y.Item);
if (result == 0) result = Comparer<long>.Default.Compare(y.Index, x.Index);
return result;
}));
long index = 0;
foreach (var item in source)
{
if (top.Count < count)
top.Enqueue(default, (item, index++));
else
top.EnqueueDequeue(default, (item, index++));
}
List<T> topList = new(top.Count);
while (top.TryDequeue(out _, out var entry)) topList.Add(entry.Item);
for (int i = topList.Count - 1; i >= 0; i--) yield return topList[i];
} It's not perfect because the The |
Surprisingly, I found that |
@theodorzoulias FYI: SuperLinq 4.0.0 has been fully released with this improvement. |
My first benchmark: public class PartialSortVsNative
{
private const int N = 1_000_000;
private readonly List<int> data = new(N);
public PartialSortVsNative()
{
var random = new Random();
for (int i = 0; i < N; i++)
data.Add(random.Next());
}
[Benchmark(Baseline = true)]
public List<int> Native() => data.OrderBy(x => x).Take(5).ToList();
[Benchmark]
public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, 5).ToList();
[Benchmark]
public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, 5).ToList();
} results in
|
With added randomness for the take count: public class PartialSortVsNative
{
private const int N = 1_000_000;
private readonly List<int> data = new(N);
private readonly int TakeCount;
public PartialSortVsNative()
{
var random = new Random();
for (int i = 0; i < N; i++)
{
data.Add(random.Next());
}
TakeCount = random.Next(N);
}
[Benchmark(Baseline = true)]
public List<int> Native() => data.OrderBy(x => x).Take(TakeCount).ToList();
[Benchmark]
public List<int> MoreLinqPartialSort() => MoreLinq.MoreEnumerable.PartialSort(data, TakeCount).ToList();
[Benchmark]
public List<int> SuperLinqPartialSort() => SuperLinq.SuperEnumerable.PartialSort(data, TakeCount).ToList();
}
|
Couple things:
Updating your code accordingly, I get the following results:
|
PS: You can also look at my old benchmarks here |
@MisinformedDNA I did some more research, and found a place where we were unnecessarily adding performance delays. Please see viceroypenguin/SuperLinq#67 for the fix, which is currently being released with SuperLinq v4.4.0. SuperLinq should now be as fast as MoreLinq in the int case, and faster in the string case. |
@MisinformedDNA my expectation is that MoreLinq's |
Hi! I ran some tests with the
PartialSort
operator, and it seems that it uses thecomparer
more frequently than necessary. For most items it should be enough to check only if the item is smaller that the smallest of the top items, but instead it performs a binary-search operation invariably for each item.I have posted my findings on StackOverflow, in this answer. It seems that the implementation of the
PartialSort
could be optimized.The text was updated successfully, but these errors were encountered: