Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Interleave to prevent eager MoveNext calls #696

Merged
merged 23 commits into from
Dec 12, 2019

Conversation

Orace
Copy link
Contributor

@Orace Orace commented Nov 14, 2019

This PR addresses #694

Copy link
Contributor Author

@Orace Orace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private extension method is useless here

MoreLinq/Interleave.cs Outdated Show resolved Hide resolved
Orace and others added 3 commits November 14, 2019 10:30
Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestInterleaveDoNoCallMoveNextEagerly doesn't prove what it intends to. Both your sample sequences are one element long and you just throw for the retrieval of the element in the second sequence. That doesn't test for MoveNext being called eagerly. That would have looked as follows:

[Test]
public void TestInterleaveDoNoCallMoveNextEagerly()
{
    void Code()
    {
        var sequenceA = Enumerable.Range(1, 1);
        var sequenceB = MoreEnumerable.From(() => 2, () => throw new TestException());

        sequenceA.Interleave(sequenceB).Take(1).Consume();
    }

    Assert.DoesNotThrow(Code);
}

And it succeeds with the current implementation (without the changes in this PR).

@atifaziz
Copy link
Member

I fixed your test in 61e4921, used the origin Interleave implementation in b790e53, even merged the simpler yield loop with c569479, and as you can see, all tests pass:

Testing netcoreapp3.0 (Release)...
NUnitLite 3.12.0 (.NET Standard 2.0)
Copyright (c) 2019 Charlie Poole, Rob Prouse
Runtime Environment
   OS Version: Microsoft Windows 10.0.17763
  CLR Version: 3.0.0
Test Files
    C:/projects/morelinq/MoreLinq.Test/bin/Release/netcoreapp3.0/MoreLinq.Test.dll
Test Discovery
  Start time: 2019-11-22 10:39:37Z
    End time: 2019-11-22 10:39:37Z
    Duration: 0.355 seconds
Tests Not Run
1) Explicit : MoreLinq.Test.MemoizeTest.MemoizeIsThreadSafe
2) Ignored : MoreLinq.Test.PartialSortByTests.PartialSortByIsStable
TODO
3) Ignored : MoreLinq.Test.PartialSortTests.PartialSortByIsStable
TODO
4) Explicit : MoreLinq.Test.RandomSubsetTest.TestRandomSubsetIsUnbiased
Run Settings
    Number of Test Workers: 2
    Work Directory: C:\projects\morelinq
    Internal Trace: Off
Test Run Summary
  Overall result: Warning
  Test Count: 2165, Passed: 2161, Failed: 0, Warnings: 0, Inconclusive: 0, Skipped: 4
    Skipped Tests - Ignored: 2, Explicit: 2, Other: 0
  Start time: 2019-11-22 10:39:37Z
    End time: 2019-11-22 10:39:40Z
    Duration: 2.465 seconds
Results (nunit3) saved as C:\projects\morelinq\TestResult.xml

@Orace
Copy link
Contributor Author

Orace commented Nov 22, 2019

Both your sample sequences are one element long and you just throw for the retrieval of the element in the second sequence.

Exactly, the code asks for one element and this element came from the first sequence:

sequenceA.Interleave(sequenceB).Take(1).Consume();

It throw for the retrieval of the first element in the second sequence because Interleave should not eagerly retrieve for it. It's the second element of the result sequence and since the code don't ask for it, Interleave shouldn't retrieve it.

@atifaziz
Copy link
Member

Right, I've reverted your originally proposed test and fix with 8a972a9 and will review.

MoreLinq/Interleave.cs Outdated Show resolved Hide resolved
try
{
// First pass. create enumerators.
foreach (var sequence in sequences)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have Acquire for this so I would take advantage of it here rather than roll out a duplicate implementation that needs separate testing.

Copy link
Contributor Author

@Orace Orace Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIU, Acquire do not return a IDisposable so Interleave keep the responsibility of disposing the enumerators when the result sequence enumerator is disposed.
Achieve this with a using for each sequence is not feasible since the number of using is dynamic.
The only way I know to achieve this in a yield context is to use some try {} finally {} blocks. Because the finally block is called when the enumerator is disposed.

We indeed can put this behavior in a specialized IDisposable class. When disposed, this object dispose a bunch of elements.
DisposableGroup<T> (declared in SortedMerge and only used here) looks to achieve this, but DisposableGroup<T> internally use a list where I use a LinkedList.
(Rem: DisposableGroup<T> constructor doesn't dispose the already acquired object in case of error in the input sequence, like Acquire do. But anyway, the only usage of DisposableGroup<T> use Acquire).

I have chosen to use a LinkedList since Remove operation is O(1) there.
If we use a list or an array, where the the Remove operation is O(N), enumerating N sequences of 1 element will cost O(N²).
I know that it is an uncommon use case (it may be less uncommon with #697), anyway this implementation manage it.
Rem: Because of DisposableGroup<T> implementation SortedMerge is O(N²) in the case described above.

Copy link
Member

@atifaziz atifaziz Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIU, Acquire do not return a IDisposable

It would return an array of enumerators that are disposable. Question is whether there is something to be gained out of not calling GetEnumerator() on other sequences until you get to them, but that will only optimize the case of someone eventually taking fewer elements than sequences!

Copy link
Contributor Author

@Orace Orace Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquire will not remove the necessity of a try {} finally {} (with disposal of the enumerators in the finally block), and then is just an alias of ToArray().

  • An array is not optimal here.
  • If Acquire is used and if an error occur while acquiring (the only purpose of Acquire is to manage this case), elements will be disposed twice, which is useless.

So Acquire here is not optimal and useless.

A ToDisposableLinkedList() method that have the functionality of Acquire (on error, dispose already acquired elements) and provide a DisposableLinkedList that have the functionality of DisposableGroup<T> (being disposable and dispose its content when disposed) is the way to go for Interleave, SortedMerge and Transpose.

It will be used like that:

using var enumerators = source.Select(e => e.GetEnumerator()).ToDisposableLinkedList();

The using here allow to remove the try {} finally {} from here:

finally
{
foreach (var e in enumerators)
e?.Dispose();
}

and here:
finally
{
Debug.Assert(iteratorList != null || iterators != null);
foreach (var iter in (iteratorList ?? (IList<IEnumerator<T>>) iterators))
iter.Dispose();
}

and improve the DisposableGroup/Acquire combo from here:

using var disposables = new DisposableGroup<TSource>(sequences.Select(e => e.GetEnumerator()).Acquire());

I opened #724 to refactor Interleave, SortedMerge and Transpose.

Copy link
Member

@atifaziz atifaziz Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquire will not remove the necessity of a try {} finally {} (with disposal of the enumerators in the finally block), and then is just an alias of ToArray().

Acquire itself does not need to be inside a try-finally block and so it's not the same as ToArray(). It doesn't mean it prevents you from having one. It is, for example, perfectly safe to return the array from Acquire as the return value of a function:

static Stream[] OpenFiles(params string[] paths) =>
    paths.Select(p => File.OpenRead(p)).Acquire();

Of course, the disposal becomes the responsibility of the caller and will need a try-catch. It does prevent that.

  • If Acquire is used and if an error occur while acquiring (the only purpose of Acquire is to manage this case), elements will be disposed twice, which is useless.

It is useless but it's not a problem since it is a requirement for IDisposable.Dispose to be idempotent (per docs):

If an object's Dispose method is called more than once, the object must ignore all calls after the first one.

I have chosen to use a LinkedList since Remove operation is O(1) there.

That's a fine choice but I am not debating the most optimal solution first. My point with Acquire was that it's carefully designed and tested and so might be best to keep using it instead of putting everything into question. This PR was about fixing one problem but by trying to be cleverly optimal at the same time, it may introduce others bugs (and I believe it does)! This means it takes longer to review as the scope of change expands. Instead, I find that it helps to make it work, make it right and then make it fast.

Copy link
Contributor Author

@Orace Orace Nov 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just push a simple (I hope) implementation of Interleave based on the array returned by Acquire.
The LinkedList implementation is in #726

Anyway it's a rebuild of current implementation since it's over-complicated (manage not used cases) and flawed (original issue #694).

Copy link
Contributor Author

@Orace Orace Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atifaziz with the implementation using Acquire in this PR (currently at a7f2c6f), calls to GetEnumerator are made eagerly and the code below throw:

var sequenceA = Enumerable.Range(1, 1);
var sequenceB = new BreakingSequence<int>();
sequenceA.Interleave(sequenceB).Take(1).Consume();

I don't know it it's acceptable
It doesn't throw in #726.

MoreLinq/Interleave.cs Outdated Show resolved Hide resolved
Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better. Thanks!


PS I've been busy on another project and needed to put this aside for a bit to have focus.

MoreLinq/Interleave.cs Outdated Show resolved Hide resolved
MoreLinq/Interleave.cs Outdated Show resolved Hide resolved
MoreLinq.Test/InterleaveTest.cs Outdated Show resolved Hide resolved
@Orace Orace requested a review from atifaziz December 12, 2019 08:38
MoreLinq.Test/InterleaveTest.cs Outdated Show resolved Hide resolved
@atifaziz atifaziz changed the title Interleave: Add tests, Fix implementation Fix Interleave to prevent eager MoveNext calls Dec 12, 2019
Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome and thank you!

@atifaziz atifaziz merged commit 52f93f4 into morelinq:master Dec 12, 2019
@Orace Orace deleted the issue694 branch December 12, 2019 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants