Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Fail Overflow Test less flaky #164

Merged
merged 5 commits into from
Mar 22, 2024

Conversation

abutch3r
Copy link
Contributor

@abutch3r abutch3r commented Feb 22, 2024

Fail overflow test can fail to pass if exception is thrown after more then 256 messages have been processed.

increase window within which a failure can occur by applying a similar change as was done in #158

Fail overflow test can fail to pass if exception is thrown after more then 256 messages have been processed

eclipse#162
@eclipse-microprofile-bot
Copy link
Contributor

Can one of the admins verify this patch?

@Azquelt
Copy link
Member

Azquelt commented Feb 22, 2024

@eclipse-microprofile-bot please test this

@Azquelt
Copy link
Member

Azquelt commented Feb 22, 2024

To be honest, this test looks pretty sketchy. We're submitting to the executor in a tight loop, but the methods downstream of the processor seem to also run as quickly as possible.

I don't think there's any guarantee in this test that the emitter will overflow?

@cescoffier
Copy link
Contributor

Yes, it may not overflow depending on the timing of the test.

We may want to disable it completely.

@Azquelt
Copy link
Member

Azquelt commented Feb 22, 2024

@eclipse-microprofile-bot test this please

@abutch3r
Copy link
Contributor Author

abutch3r commented Feb 22, 2024

@cescoffier
I have updated the test bean to be in line with the other beans which all contain a 1 millisecond sleep, which should for all practical purposes mean that the test will now throw the failure as expected.

Copy link
Member

@Azquelt Azquelt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good.

@Azquelt
Copy link
Member

Azquelt commented Feb 26, 2024

1 second sleep

That'll be a 1 millisecond sleep - which should still be enough to ensure that we send messages to the emitter faster than they can be processed.

@kabir
Copy link
Contributor

kabir commented Feb 27, 2024

We're also seeing occasional failures in DropOverflowStrategyOverflowTest/

https://issues.redhat.com/browse/WFLY-17635

Would a similar fix be appropriate there? I've not looked beyond verifying that the classes are different :-)

@Emily-Jiang
Copy link
Member

We're also seeing occasional failures in DropOverflowStrategyOverflowTest/

https://issues.redhat.com/browse/WFLY-17635

Would a similar fix be appropriate there? I've not looked beyond verifying that the classes are different :-)

Please open a different issue and propose a PR @kabir

@Emily-Jiang
Copy link
Member

@cescoffier are you okay with this PR?

@cescoffier
Copy link
Contributor

@ozangunalp can you check? It should be fine (as discussed).

@ozangunalp
Copy link
Contributor

It seems to me like it'll almost always fail.

The second emit will be called before the very first message arrives to the out because of the sleep(1), and cause a lack of requests exception without any messages registered at the output.

I think we should just get rid of the assertion assertThat(bean.output()).isNotEmpty().... The emission and processing happens on different threads, We can't be sure if we process any messages.

@abutch3r
Copy link
Contributor Author

@kabir your issue matches the fix in #158 which is not currently in a published version of the TCK. When a new version is published then the fix for that test would also be included

@Azquelt
Copy link
Member

Azquelt commented Feb 28, 2024

It seems to me like it'll almost always fail.

The second emit will be called before the very first message arrives to the out because of the sleep(1), and cause a lack of requests exception without any messages registered at the output.

I think we should just get rid of the assertion assertThat(bean.output()).isNotEmpty().... The emission and processing happens on different threads, We can't be sure if we process any messages.

Good point.

In practise, the await() line may often wait long enough for the first message to arrive since it only checks the condition every 100ms by default, but it is still a nasty race condition.

@Azquelt
Copy link
Member

Azquelt commented Feb 28, 2024

Ok, @abutch3r and I just had a chat about this test.

As soon as the emitter overflows, it emits a failure and terminates the subscription (1.6). At this point all further messages should result in the Emitter throwing IllegalStateException because the stream is terminated.

So when the emitter first throws an exception, neither the first message, nor the failure signal may have reached the end of the stream yet.

Currently the test awaits on an exception being thrown from the emitter. We think if we change it to wait for the onError signal to arrive, that would guarantee that the assertThat(bean.output()).isNotEmpty().hasSizeLessThan(999); check would pass, since the first item should be processed before the error signal.

At that point, it would be theoretically possible for the test thread not to yet see an exception thrown from the emitter since that's done asynchronously, so we should await() on that arriving too.

Wait for the failure on the stream to occur instead of the exception on the emitter - this ensures that at least the first message will be sucessfully processed and that a failure did occur before assertions are checked.

In the case where the failure may occur sufficiently late in the test execution such that there is a failure, but not yet an exception. In this case emit one more message and wait for the exception before checking

emitThree was unused and would close close the stream via `.complete()` - repurpose for being able to send one message and not close the stream if successful.
//If an exception has not yet been thrown after the failure occurred, try one more message
if (bean.exception() == null) {
bean.emitOne();
await().until(() -> bean.exception() != null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't needed because emitOne runs synchronously.

@ozangunalp
Copy link
Contributor

I am trying to find a way to make this test work but even by waiting on the failure() and not the exception(), the error signal will arrive first and complete the stream before the first message arrives to the downstream out channel.

@Azquelt
Copy link
Member

Azquelt commented Mar 1, 2024

the error signal will arrive first and complete the stream before the first message arrives to the downstream out channel

Hmm, I was under the impression that the failure signal should follow the items down the stream, not reaching the final subscriber until the previous two items had been processed, but I guess that's not the case?

@Emily-Jiang
Copy link
Member

@cescoffier @ozangunalp do you have any objections to merge this PR?

@ozangunalp
Copy link
Contributor

I think the previous changes are making this test even more flaky.
I suggest we keep the old assertion about the size of the output and make the other changes :

    @Test
    public void testOverflow() {
        bean.emitALotOfItems();

        await().until(() -> bean.failure() != null);
        assertThat(bean.failure()).isInstanceOf(Exception.class);
        assertThat(bean.output()).doesNotContain("999");
        assertThat(bean.output()).hasSizeBetween(0, 256);
        // If an exception has not yet been thrown after the failure occurred, try one more message
        if (bean.exception() == null) {
            bean.emitOne();
        }
        assertThat(bean.exception()).isInstanceOf(IllegalStateException.class);
    }

I think we can also revert the added Thread.sleep(1).

@Azquelt
Copy link
Member

Azquelt commented Mar 8, 2024

I think you need the Thread.sleep(1) to ensure that each message isn't just processed as it comes in.

Each call to emitter.send() results in an async task starting, and the test relies on the next call to emitter.send() running before the async task has completed. Though this is likely to happen for at least one of the 1000 calls, the Thread.sleep(1) almost guarantees it.

Having a lock in there so that the test can block tasks from completing until the emitter fails would be an alternative.

@abutch3r abutch3r force-pushed the flaky_emitter_overflow_fail_tck branch from 44b9861 to 3a5eeb5 Compare March 21, 2024 16:56
@abutch3r
Copy link
Contributor Author

@ozangunalp

I have removed the .isNotEmpty() check from the assertions and reintroduced the .doesNotContain("999") as that validates that no messages are later accepted.

the main issue we saw with the test is that the failure could occur after more then 500 messages were processed, so the 256 check does need to be increased to allow for the situation where we do process more then expected, but the sleep which is used in every other emitter test should provide a better guarantee that we see it fail.

Are these changes now satisfactory

@ozangunalp
Copy link
Contributor

Yes, this looks good to me, thanks! @abutch3r @Azquelt

@Azquelt Azquelt merged commit e025074 into eclipse:main Mar 22, 2024
2 checks passed
@Emily-Jiang Emily-Jiang added this to the 3.0.1 milestone Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants