Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Memoize() Concerns #889

Open
viceroypenguin opened this issue Nov 15, 2022 · 6 comments
Open

.Memoize() Concerns #889

viceroypenguin opened this issue Nov 15, 2022 · 6 comments
Milestone

Comments

@viceroypenguin
Copy link
Contributor

viceroypenguin commented Nov 15, 2022

I see two concerns with .Memoize(), that I would like to discuss before addressing:

  • Currently, the _error field is use for both .GetEnumerator() errors and iteration errors. This is incorrect under the following scenario:
    1. Obtain an iterator via .GetEnumerator()
    2. Encounter an error during iteration via .MoveNext()
    3. Attempt to obtain a new iterator.
    • The expectation is that I would be able to get an iterator, and that it would iterate to the exception
    • The current code shares the _error field, and so I would get an error in .GetEnumerator() and not be able to iterate to the error.
  • Currently, the MemoizedEnumerable incorrectly implements the IDispose pattern
    • When an object is disposed, it should be prepared to be GC'd. All future attempts to use any aspect of the object should throw an ObjectDisposedException
    • The current code only throws an ObjectDisposedException when the object holds an open iteration after being disposed. The .Dispose() method is treated as a reset rather than an actual disposal; clearing the current cache and starting a new iteration as if .Skip() had been called on the original sequence.
    • If clearing the cache is a desired behavior, it should be done under a .Clear() method instead of .Dispose().

@leandromoh I am happy to update the code to correct these issues, but I would like to understand existing thought patterns if either of the above patterns is intended.

CC: @leandromoh

@atifaziz
Copy link
Member

  • Currently, the MemoizedEnumerable incorrectly implements the IDispose pattern

For better or worse, this is by-design. We had a long thread about this that's worth reading up to understand the various arguments that were made for and against. If it ever graduates out of the experimental namespace, it should be really called Cache because it's how Seq.cache works:

The enumerator may be disposed and underlying cache storage released by converting the returned sequence object to type IDisposable, and calling the Dispose method on this object. The sequence object may then be re-enumerated and a fresh enumerator will be used.

  • When an object is disposed, it should be prepared to be GC'd.

The disposable pattern is there to make releasing of unmanaged resources deterministic instead of preparing for GC. Over time, it's also been used for just taking advantage of the using syntax sugar to scope state changes (instating and reverting). For example, log scopes do this. Whether it's an absolute abuse is debatable.

All future attempts to use any aspect of the object should throw an ObjectDisposedException

Even the iterators returned by standard LINQ operators don't do this:

using var e = Enumerable.Range(1, 10).GetEnumerator();
Console.WriteLine(e.MoveNext());    // prints: True
e.Dispose();
Console.WriteLine(e.Current);       // prints: 1
Console.WriteLine(e.MoveNext());    // prints: False

@atifaziz
Copy link
Member

atifaziz commented Nov 20, 2022

Can you please post a working example demonstrating your first concern? I tried to come up with something based on your description:

using System;
using System.Collections.Generic;
using MoreLinq.Experimental;

var foo = Foo().Memoize();
for (var i = 0; i < 2; i++)
{
    try
    {	        
        Console.WriteLine("Calling GetEnumerator()...");
        using var e = foo.GetEnumerator();
        Console.WriteLine("Calling MoveNext()...");
        e.MoveNext();
        Console.WriteLine($"Current = {e.Current}");
        Console.WriteLine("Calling MoveNext()...");
        e.MoveNext();

    }
    catch (Exception ex)
    {
        Console.WriteLine($"! {ex.Message}");
    }
}

static IEnumerable<int> Foo()
{
    yield return 42;
    throw new ApplicationException();
}

that I think would show an observable difference, but it doesn't:

Calling GetEnumerator()...
Calling MoveNext()...
Current = 42
Calling MoveNext()...
! Error in the application.
Calling GetEnumerator()...
Calling MoveNext()...
Current = 42
Calling MoveNext()...
! Error in the application.

In other words, the output is identical for both iterations.

@viceroypenguin
Copy link
Contributor Author

Can you please post a working sample demonstrating your first concern?

Nope - poor reading of the code combined with reading the unit test names. I see now that _cache blocks access to that branch of the code and no issues there. Initial comment updated accordingly.

@viceroypenguin
Copy link
Contributor Author

viceroypenguin commented Nov 23, 2022

The disposable pattern is there to make releasing of unmanaged resources deterministic instead of preparing for GC. Over time, it's also been used for just taking advantage of the using syntax sugar to scope state changes (instating and reverting). For example, log scopes do this. Whether it's an absolute abuse is debatable.

Agreed, but under logging scopes, a separate object is created for each scope. That is, .BeginScope() is called to initialize a scope, which returns an IDisposable which can be used as such. However, at no point is the same object intended to be disposed multiple times, or to be used after it has been disposed.

I understand some of the comments and concerns in the discussion, however, I would disagree with the conclusion, for a simple reason: If code is written to work for an IEnumerable<>, then it should work for all IEnumerable<>. Example:

// nb: poor code here, meant to prove a point
public (TSource loop1, TSource loop2) Process<TSource>(IEnumerable<TSource> source)
{
  var memoized = source.Memoized();
  return (DoLoop(), DoLoop());

  TSource DoLoop()
  {
    using ((IDisposable)memoized)
      return memoized.First();
  }
}

However, this code will not work if the enumerable passed in is an ICollection<> or IReadOnlyCollection<> (since .Memoize() will return the original object, which likely doesn't support IDisposable). Also, the ability to clear the memoization cache is not intuitively obvious to any consumer. Very few IEnumerable<> support IDisposable as well, and it is not clear what IDisposable would mean on a bare IEnumerable<> either.

It would be much better if:
a) .Memoize() (or .Cache()) returned a specific type referencing the fact that it is a buffered enumerable - I liked the IBufferedEnumerable<> mentioned in that thread.
b) .Memoize() operated on ICollection<> and IReadOnlyCollection<> objects as well. In this case, the MemoizedEnumerable<> would simply act as a proxy object instead of caching data directly.

Now, I understand your concerns about directly exposing IDisposable with static analysis tools, but I actually don't like the idea of .Dispose() in this context anyway.

When taking about IDisposable, there are two primary implementations: 1. An object has a defined lifetime and we want to define the lifetime and allow objects to be cleaned up at a defined point in time, or 2. An object is used to define a scope of activity (logging scopes, etc.).

Clearing the cache in this enumerator doesn't feel, to me, as currently implemented, to fit into either of those categories. Instead, IBufferedEnumerable.ClearCache() would be a more appropriate method, to be called as desired by the consumer.

I understand that F# has written Seq.cache with the language that they did, but it is not idiomatic C#, and it hides capabilities and nuances through type-casting, which is always a recipe for either incorrect code, or misunderstood code.

@viceroypenguin
Copy link
Contributor Author

NB: I consider this a v4.0.0 (or later) issue anyway, so no need to put priority on this.

@atifaziz atifaziz added this to the vNext milestone Nov 23, 2022
@viceroypenguin
Copy link
Contributor Author

viceroypenguin commented Feb 13, 2023

Another concerning situation:
3. The behavior of Memoize usage on a list:

var list = new List<int>{1,2,3};
var memo = list.Memoize();

var beforeCount = memo.Count();
Assert.That(beforeCount, Is.EqualTo(3));

list.Add(4);
var afterCount = memo.Count();

// fails, because Memoize retrieves the collection iterator
// which uses the value of the list at the time iteration is performed
// this means that data is not truly memoized correctly.
Assert.That(afterCount, Is.EqualTo(beforeCount));

Where this becomes relevant is:

var list = new List<int>{1,2,3};
var repeat = list.Repeat(2);

using (var iter = repeat.GetEnumerator())
{
  iter.MoveNext();
  Assert.That(iter.Current, Is.EqualTo(1));
  iter.MoveNext();
  Assert.That(iter.Current, Is.EqualTo(2));
  iter.MoveNext();
  Assert.That(iter.Current, Is.EqualTo(3));

  list.Add(4);

  // fails, because the list has been modified.
  // however, the first pass of the list has been
  // completed, so this change to the list should 
  // not be seen by the `Repeat` operator
  iter.MoveNext();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants