Implemented first version of the asReader interface on IString #273

DavyLandman · 2024-09-19T13:43:30Z

This implements the addition discussed in #71

I've added some tests, hope they are enough.

github-actions · 2024-09-19T13:48:53Z

Test Results

99 files ±0 99 suites ±0 5m 29s ⏱️ -4s
242 304 tests +2 242 303 ✅ +2 1 💤 ±0 0 ❌ ±0
726 999 runs +6 726 996 ✅ +6 3 💤 ±0 0 ❌ ±0

Results for commit f7d591e. ± Comparison against base commit 38556b4.

♻️ This comment has been updated with latest results.

…est uncovered

jurgenvinju · 2024-09-19T14:48:47Z

The tests are very good because many random values will be used.

jurgenvinju

Nice to have progress on this! Thanks. I think this demonstrates exactly what is needed.

The reader for indented values requires rethinking without the use of the IntStream. The reader should at least be able to stream entire lines, up too and including the newline character. But larger buffers are preferable. One typical use case here is large json objects serialized as strings with indentation. So if we can cram several lines into buffers up to 8k that will surely help with server response times.

With respect to tests; I'm pretty sure the random string generator does not play with indent yet, or with deep concatenation of long strings. To test this better we could either generate some examples ourselves, or extend the default random generator for strings..

jurgenvinju · 2024-09-19T14:54:19Z

src/main/java/io/usethesource/vallang/impl/primitive/StringValue.java

+                @Override
+                public int read(char[] cbuf, int off, int len) throws IOException {
+                    int result = currentReader.read(cbuf, off, len);
+                    if (result == -1 && !readingRight) {


Could use some comments to explain that read is allowed to return fewer chars than Len.

Ah okay, I didn't want to repeat the manual of read, but indeed, at least 1 char is enough.

jurgenvinju · 2024-09-19T15:01:16Z

src/main/java/io/usethesource/vallang/impl/primitive/StringValue.java

@@ -1380,6 +1438,72 @@ public void write(Writer w) throws IOException {
            assert indents.isEmpty();
        }

+        @Override


I think this is correct, although I see we need to work hard with the surrogate pairs. Maybe If we would always read until len - 1 unless the last character is a newline, and the int iterator always provides codepoints, there are no further special cases.

But before we merge we should find an implemention that can say least stream line by line. I suspect that the current experiment with the int iterator will slow down the stream to the point that it doesn't make sense to stream anymore.

Well, we need to do work with surrogate pairs since every int the iterator returns can be either 1 or 2 chars. I initially had some code that would write 2 chars if there was room, but that just lead to duplication of logic with the queedLowSurrogate field. So I rewrote that to be a bit less wordy.

jurgenvinju · 2024-09-19T15:16:52Z

To make indented write easier i distributed indentedWrite methods over all implementation classes. Possibly an indentedRead method would help as well in this case. For larger buffers than 1.

jurgenvinju · 2024-09-19T15:52:22Z

src/main/java/io/usethesource/vallang/impl/primitive/StringValue.java

@@ -1146,6 +1163,47 @@ public void write(Writer w) throws IOException {
            right.write(w);
        }

+        @Override
+        public Reader asReader() {
+            return new Reader() {


Here we allocate a reader at every level in the tree and every node in every level. For the writer this design didn't hold up because of the enormous amounts of concat nodes in some applications. Typically we have at least as many concat nodes as lines in the output.

So let's have a look at this and also use the benchmarks in the main method so we can fine-tune it. Typically a recursion is ok for the concat nodes and maximally log(n) allocations (a spine of reader objects that moves over the tree).

The problem is that we have to keep state, the writer was easier as it's push, while the reader is pull. I tried to do it in a similar way as the writer, but could not in a reasonable timeframe figure it out. We need co-routintes ;)

jurgenvinju · 2024-09-19T22:14:24Z

This algorithm is what we ended up with to avoid allocation of iterator or stream objects for every node in the balanced binary tree:

@Override
        public OfInt iterator() {
            return new OfInt() {
                final Deque<AbstractString> todo = new ArrayDeque<>(depth);
                OfInt currentLeaf = leftmostLeafIterator(todo, LazyConcatString.this);

                @Override
                public boolean hasNext() {
                    return currentLeaf.hasNext(); /* || !todo.isEmpty() is unnecessary due to post-condition of nextInt() */
                }

                @Override
                public int nextInt() {
                    int next = currentLeaf.nextInt();

                    if (!currentLeaf.hasNext() && !todo.isEmpty()) {
                        // now we back track to the previous node we went left from,
                        // take the right branch and continue with its first leaf:
                        currentLeaf = leftmostLeafIterator(todo, todo.pop());
                    }

                    assert currentLeaf.hasNext() || todo.isEmpty();
                    return next;
                }
            };
        }

If we turn this into an Iterator<AbstractString> where each AbstractString is guaranteed to be a leaf node, then that would work. Only have to check what the indented nodes do in this context.

DavyLandman · 2024-09-20T11:14:18Z

@jurgenvinju the new implementation uses the exact same style as the int iterators.

jurgenvinju

This is fast and useful. Hope we can apply it in the webservers to improve visuals and other features like salix diffs.

Implemented first version of the asReader interface on IString

16073c5

DavyLandman requested a review from jurgenvinju September 19, 2024 13:43

DavyLandman added 3 commits September 19, 2024 16:18

Added tests for reader and writers

21c2bf4

More tests for the reader and write equality

e289490

Improved test coverage of the asReader function and fixed a bug the t…

a7cf69a

…est uncovered

Simplified hanlding of surrogate pairs in the loop

206676f

jurgenvinju requested changes Sep 19, 2024

View reviewed changes

jurgenvinju reviewed Sep 19, 2024

View reviewed changes

Rewrote the readers to use the CharBuffer iterator style

589aa9f

DavyLandman added 4 commits September 20, 2024 14:01

Make sure to fill the buffers as much as we can

ea2101a

Simplified buffer code

1bc04ad

Refactored away clone

f107012

Further refactoring of common iterator pattern

9909809

jurgenvinju approved these changes Sep 21, 2024

View reviewed changes

DavyLandman added 4 commits September 22, 2024 11:28

Improved some tests

82c9844

Refactored iterator of iterator into a single class that does both

b47627c

Helping CF realize things are initialized

f70c200

Fixed indentation error

f7d591e

DavyLandman merged commit 8621d34 into main Sep 22, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented first version of the asReader interface on IString #273

Implemented first version of the asReader interface on IString #273

DavyLandman commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

jurgenvinju commented Sep 19, 2024

jurgenvinju left a comment •

edited

Loading

jurgenvinju Sep 19, 2024

DavyLandman Sep 20, 2024

jurgenvinju Sep 19, 2024

jurgenvinju Sep 19, 2024

DavyLandman Sep 20, 2024

jurgenvinju commented Sep 19, 2024

jurgenvinju Sep 19, 2024 •

edited

Loading

DavyLandman Sep 20, 2024

jurgenvinju commented Sep 19, 2024

DavyLandman commented Sep 20, 2024

jurgenvinju left a comment

Implemented first version of the asReader interface on IString #273

Implemented first version of the asReader interface on IString #273

Conversation

DavyLandman commented Sep 19, 2024 • edited Loading

github-actions bot commented Sep 19, 2024 • edited Loading

Test Results

jurgenvinju commented Sep 19, 2024

jurgenvinju left a comment • edited Loading

Choose a reason for hiding this comment

jurgenvinju Sep 19, 2024

Choose a reason for hiding this comment

DavyLandman Sep 20, 2024

Choose a reason for hiding this comment

jurgenvinju Sep 19, 2024

Choose a reason for hiding this comment

jurgenvinju Sep 19, 2024

Choose a reason for hiding this comment

DavyLandman Sep 20, 2024

Choose a reason for hiding this comment

jurgenvinju commented Sep 19, 2024

jurgenvinju Sep 19, 2024 • edited Loading

Choose a reason for hiding this comment

DavyLandman Sep 20, 2024

Choose a reason for hiding this comment

jurgenvinju commented Sep 19, 2024

DavyLandman commented Sep 20, 2024

jurgenvinju left a comment

Choose a reason for hiding this comment

DavyLandman commented Sep 19, 2024 •

edited

Loading

github-actions bot commented Sep 19, 2024 •

edited

Loading

jurgenvinju left a comment •

edited

Loading

jurgenvinju Sep 19, 2024 •

edited

Loading