Added SeqBuffer<T> class as a replacement for some of Seq.toList() calls #305

tlinkowski · 2017-04-27T10:49:12Z

I propose a solution to the problems described in #195 for the following methods:

crossSelfJoin() (crossJoin() needs to be adapted by @lukaseder in jOOQ-tools)
inner(Self)Join()
leftOuter(Self)Join()

The solution for nearly all the remaining methods will consist in applying Seq.lazy() to them for which I'll create a separate PR in order to demonstrate its usefulness, as discussed in #302.

PS. SeqBuffer could be used to create a simpler implementation of Seq.duplicate.
PPS. SeqBuffer could also be used to implement method Seq.duplicate(int count). Such method would return a List<Seq<T>> containing count Seqs. However, the only application for such method that I can think of is implementing #49 which (to be honest) I don't find very useful either.

tlinkowski · 2017-04-30T18:48:23Z

I added a new implementation of Seq.splitAt [#308] that uses the proposed SeqBuffer.

lukaseder · 2017-05-01T07:46:13Z

That looks very interesting, thank you very much for your suggestion. I'll review this later (hopefully in the next 2 days). We'll definitely need a reusable SeqBuffer.

…xtIndex"

…acteristics()

lukaseder · 2017-05-01T17:05:48Z

src/main/java/org/jooq/lambda/SeqBuffer.java

+    }
+
+    private final Spliterator<T> source;
+    private final List<T> buffer = new ArrayList<>();


Hmm, this buffer is not thread safe. It may well be that one consumer of a buffered result consumes the buffer while another consumer produces new elements at the same time. E.g., this could happen in your splitAt() implementation suggestion here: b345bbb

lukaseder · 2017-05-01T17:07:11Z

src/main/java/org/jooq/lambda/Seq.java

-                v2.map(t -> t.v1)
-            ));
+        SeqBuffer<T> buffer = SeqBuffer.of(stream);
+        return tuple(buffer.seq().limit(position), buffer.seq().skip(position));


Much nicer, indeed. Only caveat: Thread safety in the current SeqBuffer implementation

As a side note: do you realize that the current implementation of Seq.splitAt (as well as Seq.partition) is not thread safe? :)

Hmm, splitAt() doesn't look wrong to me, but partition may well be.

I was under the impression that Seq wasn't thread safe to begin with or was it just that it will not be a parallel stream?

There's currently no guarantee in the API, as there are too many flaws, still. But in principle, the results from methods like splitAt() (which produce several Seq) should be thread safe. Or at least, we should have an option for them to be thread safe.

Perhaps this is a concept that should be reviewed more globally.

Note that thread safety and parallelism aren't the same thing in this context. Stream's parallelism allows for parallel processing of operations like map() or filter(). By keeping operations independent of one another, parallelism can drastically speed up the processing of a stream. Thread safety in our context just means that consuming two things (e.g. the splitAt() results) on different threads, we don't want to get wrong results.

But again, perhaps we should implement this more thoroughly, with a specific thread safety flag...

Well, Seq.splitAt was directly based on Seq.partition :) And Seq.partition definitely isn't thread-safe because both Seqs write to buffer1 and buffer2 without any synchronization.

@billoneil If I understand it right I think @lukaseder distinguishes between two things:

single Seq is sequential so as a consequence it's not thread safe and two threads cannot operate on the same Seq

there seems to be no restriction to consuming one Seq on one thread and another Seq on the other thread even if they both "derive" from the same Seq (hence need for thread safety in such cases)

Ah I wasn't thinking, makes sense.

tlinkowski

@lukaseder, here's my shot at thread-safety of SeqBuffer. Let me know what you think of it.

…ion()

lukaseder · 2017-05-03T10:51:08Z

src/main/java/org/jooq/lambda/SeqBuffer.java

+        if (estimateSize > MAX_ARRAY_SIZE)
+            throw new IllegalArgumentException("Stream is too long to be buffered: " + estimateSize);
+
+        return new ArrayList<>((int) estimateSize);


Hm, no I don't agree with this. If we create a large capacitied array list, we're back to the original solution that you wanted to avoid: Up-front large memory consumption for intermediate buffers. Imagine:

Seq.seq(Collections.nCopies(1000000, "value")).splitAt(999999).v1.limit(1).forEach(System.out::println);

The splitAt() operation would create a large array list only to consume its first element...

Ha, you're right! I'll revert it.

PS. We wouldn't be entirely back to square one because I mainly wanted to avoid up-front processor consumption (i.e. consuming entire Seq) but you're absolutely right that it's best to also avoid up-front memory consumption.

…stimateSize()" because it introduced unnecessary up-front memory consumption Reverted from commit f54482e

lukaseder · 2017-05-10T13:53:46Z

Thanks for the fix. Now, I wonder again if it would be possible to squash all commits into one for this PR (GitHub has such a feature). That would make the final review much simpler for me.

[jOOQ#195] Applied SeqBuffer to certain Seq method implementations [jOOQ#122] Tread-safe Seq.duplicate()

tlinkowski · 2017-05-11T15:41:00Z

Done.

PS. I couldn't find such feature (available to the PR author) anywhere on GitHub so I squashed them manually.

lukaseder · 2017-05-16T11:43:35Z

PS. I couldn't find such feature (available to the PR author) anywhere on GitHub so I squashed them manually.

Oh, interesting. That was a wrong assumption then, sorry. I can indeed squash and merge, but not squash, review again, then merge... Will send a feature request to GitHub.

Thanks very much!

lukaseder · 2017-05-16T11:46:27Z

src/main/java/org/jooq/lambda/Seq.java

-
-        return tuple(seq(new Duplicate()), seq(new Duplicate()));
+        SeqBuffer<T> buffer = SeqBuffer.of(stream);
+        return tuple(buffer.seq(), buffer.seq());


Hmm, I see. Much simpler, sure, but the flip side of this implementation is that if both duplicates are consumed at the same speed, we're wasting a lot of memory for a buffer that might no longer be needed. What are your thoughts on this?

You're absolutely right. On the other hand, current implementation is thread safe thanks to SeqBuffer, and the previous one wasn't.

All in all, I'd be more inclined to this simple implementation but I guess I'd leave an appropriate comment in this method in case someone reports any memory-related issues in the future.

lukaseder · 2017-05-16T11:48:57Z

src/main/java/org/jooq/lambda/Seq.java

-                v2.map(t -> t.v1)
-            ));
+        SeqBuffer<T> buffer = SeqBuffer.of(stream);
+        return tuple(buffer.seq().limit(position), buffer.seq().skip(position));


Hmm, splitAt() doesn't look wrong to me, but partition may well be.

lukaseder · 2017-07-28T15:47:14Z

Thanks again for this change. I'll review this later in detail, and will let you know.

This was referenced Apr 27, 2017

Add Seq.lazy() #306

Closed

Seq.splitAt implementation is inefficient #308

Closed

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request Apr 30, 2017

[jOOQ#305] fixed SeqBuffer to properly handle null elements

d786c10

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request Apr 30, 2017

[jOOQ#305] support for SeqBuffer.of(Stream<? extends T>)

ed17092

lukaseder added P: Medium T: Enhancement labels May 1, 2017

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 1, 2017

[jOOQ#305] SeqBuffer.BufferSpliterator: changed "currentIndex" to "ne…

a9a593e

…xtIndex"

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 1, 2017

[jOOQ#305] SeqBuffer.BufferSpliterator: fixed estimateSize() and char…

0812650

…acteristics()

lukaseder requested changes May 1, 2017

View reviewed changes

lukaseder added this to the Version 0.9.13 milestone May 1, 2017

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 2, 2017

[jOOQ#305] made SeqBuffer thread-safe

e0ccabe

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 2, 2017

[jOOQ#305] added ArrayList initialization based on source.estimateSize()

f54482e

tlinkowski commented May 2, 2017

View reviewed changes

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 2, 2017

[jOOQ#305] added SeqBufferTest.testThreadSafetyDuringParallelConsumpt…

1b940ad

…ion()

lukaseder requested changes May 3, 2017

View reviewed changes

tlinkowski pushed a commit to tlinkowski/jOOL that referenced this pull request May 3, 2017

[jOOQ#305] reverted "added ArrayList initialization based on source.e…

19f782d

…stimateSize()" because it introduced unnecessary up-front memory consumption Reverted from commit f54482e

[jOOQ#305] Added thread-safe SeqBuffer

61e7537

[jOOQ#195] Applied SeqBuffer to certain Seq method implementations [jOOQ#122] Tread-safe Seq.duplicate()

tlinkowski force-pushed the SeqBuffer branch from bf9a107 to 61e7537 Compare May 11, 2017 07:00

lukaseder mentioned this pull request Jul 28, 2017

[#296] Added chunked(long), chunked(Predicate<T>) #320

Closed

4 tasks

lukaseder reviewed Jul 28, 2017

View reviewed changes

lukaseder merged commit 10741dd into jOOQ:master Jul 28, 2017

lukaseder added the R: Fixed label Jul 28, 2017

tlinkowski deleted the SeqBuffer branch July 31, 2017 09:51

tlinkowski mentioned this pull request Mar 2, 2018

Adapt all Seq.crossJoin() methods to make use of SeqBuffer #333

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added SeqBuffer<T> class as a replacement for some of Seq.toList() calls #305

Added SeqBuffer<T> class as a replacement for some of Seq.toList() calls #305

tlinkowski commented Apr 27, 2017

tlinkowski commented Apr 30, 2017

lukaseder commented May 1, 2017

lukaseder May 1, 2017

lukaseder May 1, 2017

tlinkowski May 2, 2017

lukaseder May 16, 2017

billoneil Jul 28, 2017

lukaseder Jul 28, 2017

tlinkowski Jul 28, 2017

tlinkowski Jul 28, 2017

billoneil Jul 28, 2017

tlinkowski left a comment

lukaseder May 3, 2017

tlinkowski May 3, 2017

lukaseder commented May 10, 2017

tlinkowski commented May 11, 2017

lukaseder commented May 16, 2017

lukaseder May 16, 2017

tlinkowski Jul 28, 2017

lukaseder May 16, 2017

lukaseder commented Jul 28, 2017

Added SeqBuffer<T> class as a replacement for some of Seq.toList() calls #305

Added SeqBuffer<T> class as a replacement for some of Seq.toList() calls #305

Conversation

tlinkowski commented Apr 27, 2017

tlinkowski commented Apr 30, 2017

lukaseder commented May 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlinkowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukaseder commented May 10, 2017

tlinkowski commented May 11, 2017

lukaseder commented May 16, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukaseder commented Jul 28, 2017