Clarified documentation about Iterable data providers and size() calls #2027

AndreasTu · 2024-10-18T16:57:06Z

Added note to the documentation to clarify the usage of Iterables
as data pipes and the used size() method.

Fixes #2022

codecov · 2024-10-18T17:02:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.86%. Comparing base (2c7db77) to head (e026689).
Report is 157 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master    #2027      +/-   ##
============================================
+ Coverage     80.44%   81.86%   +1.41%     
- Complexity     4337     4609     +272     
============================================
  Files           441      448       +7     
  Lines         13534    14445     +911     
  Branches       1707     1829     +122     
============================================
+ Hits          10888    11826     +938     
+ Misses         2008     1942      -66     
- Partials        638      677      +39

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Vampire

I don't think we should do this or anything similar.
If it is a Collection it could also be expensive to calculate.
For Iterator it just is not done as Iterators are one-time consumable only.
For all others you can consume it multiple times and users should adjust to that imho.
They can anytime give an interator in instead, or make sure the iterable has some built-in cache to not do the expensive operation multiple times.

AndreasTu · 2024-10-18T17:13:32Z

@Vampire But why should we call a size() method on an Iterable, which is not part of the contract for Iterable.
On Collections there is the size() method contract, which we can argument "hey then apply a cache", but the call to DefaultGroovyMethods.size(Iterable) is IMHO very surprising, which leads to 3 iterations of the iterable, where the user did not define it.

I think the size() method on an iterable is a Groovy idiosyncrasy, which we should not force onto users.

Vampire · 2024-10-18T17:19:17Z

But why should we call a size() method on an Iterable, which is not part of the contract for Iterable.

Spock specs are Groovy code, not Java code.
In Groovy size() is part of the contract of Iterable: https://docs.groovy-lang.org/latest/html/groovy-jdk/java/lang/Iterable.html#size()

I think the size() method on an iterable is a Groovy idiosyncrasy, which we should not force onto users.

I'm totally on the opposite actually.
We are in Groovy environment, so we should follow Groovy semantics.
And for Groovy the method is defined and thus executed.

@leonard84 what is your PoV?

aditbhartia · 2024-10-27T09:06:34Z

docs/data_driven_testing.adoc

@@ -252,6 +252,11 @@ used as a data provider. This includes objects of type `Collection`, `String`, `
 they can fetch data from external sources like text files, databases and spreadsheets, or generate data randomly.
 Data providers are queried for their next value only when needed (before the next iteration).

+NOTE: For performance reasons it is helpful to use `Iterable.iterator()` instead of the `Iterable` as data pipe,
+      if a custom `size()` method is not available or the calculation of the `Iterable` is expensive.
+      The Groovy `Iterable` contract contains a `size()` method, which iterates the `Iterable` to calculate the size.


would specifying that iteration over the Iterable may be done multiple times before execution be a more accurate way of expressing the internal behavior? In my opinion, a reader could assume that the iteration is only done once from reading this. The implication that the next value is fetched unnecessarily also seems to contradict the statement in 253.

would specifying that iteration over the Iterable may be done multiple times before execution be a more accurate way of expressing the internal behavior?

Maybe in some cases, but not generally.
Also the user documentation should not document the internal behavior.
And actually, it is not even Spock behavior.
Spock just calls the size() method whatever that may be.
It is Groovy that - if no other size() method is present - for an Iterable calls iterator() and iterates over it to determine the size().
And this could also any time change in any Groovy version theoretically.
So this should imho not be documented in the Spock doc.

In my opinion, a reader could assume that the iteration is only done once from reading this. The implication that the next value is fetched unnecessarily also seems to contradict the statement in 253.

Yes and no.
We indeed only fetch the next value when it is needed as documented.
As said above, it is the Groovy implementation of size() that iterates over an Iterable to determine its size.
All we can say imho is, that we call size() if it is not an Iterator and that it should be efficient.
Then the user can take care to make it efficient or if not possible provide an Iterator.

In the case of an external DB, you could e. g. also provide an Iterable that has a size() method that on first call does a select count query and persist that result or similar.

docs/data_driven_testing.adoc

Vampire · 2024-10-29T00:50:42Z

docs/data_driven_testing.adoc

@@ -252,6 +252,11 @@ used as a data provider. This includes objects of type `Collection`, `String`, `
 they can fetch data from external sources like text files, databases and spreadsheets, or generate data randomly.
 Data providers are queried for their next value only when needed (before the next iteration).

+NOTE: For performance reasons it is helpful to use `Iterable.iterator()` instead of the `Iterable` as data pipe,
+      if a custom `size()` method is not available or the calculation of the `Iterable` is expensive.
+      The Groovy `Iterable` contract contains a `size()` method, which iterates the `Iterable` to calculate the size.


would specifying that iteration over the Iterable may be done multiple times before execution be a more accurate way of expressing the internal behavior?

Maybe in some cases, but not generally.
Also the user documentation should not document the internal behavior.
And actually, it is not even Spock behavior.
Spock just calls the size() method whatever that may be.
It is Groovy that - if no other size() method is present - for an Iterable calls iterator() and iterates over it to determine the size().
And this could also any time change in any Groovy version theoretically.
So this should imho not be documented in the Spock doc.

In my opinion, a reader could assume that the iteration is only done once from reading this. The implication that the next value is fetched unnecessarily also seems to contradict the statement in 253.

Yes and no.
We indeed only fetch the next value when it is needed as documented.
As said above, it is the Groovy implementation of size() that iterates over an Iterable to determine its size.
All we can say imho is, that we call size() if it is not an Iterator and that it should be efficient.
Then the user can take care to make it efficient or if not possible provide an Iterator.

In the case of an external DB, you could e. g. also provide an Iterable that has a size() method that on first call does a select count query and persist that result or similar.

spock-core/src/main/groovy/spock/util/EmbeddedSpecRunner.groovy

spock-specs/src/test/groovy/org/spockframework/datapipes/DataPipesIteratorSpec.groovy

spock-core/src/main/groovy/spock/util/EmbeddedSpecRunner.groovy

spock-specs/src/test/groovy/spock/util/EmbeddedSpecRunnerSpec.groovy

spock-specs/src/test/groovy/org/spockframework/datapipes/DataPipesIteratorSpec.groovy

Added note to the documentation to clarify the usage of Iterables as data pipes and the used size() method. Fixes spockframework#2022

Vampire

lgtm, thx

AndreasTu added this to the 2.4-M5 milestone Oct 18, 2024

AndreasTu self-assigned this Oct 18, 2024

AndreasTu mentioned this pull request Oct 18, 2024

Passing in a custom Iterable as a data provider causes Spock to iterate over each item multiple times #2022

Closed

AndreasTu requested a review from leonard84 October 18, 2024 16:59

Vampire requested changes Oct 18, 2024

View reviewed changes

AndreasTu force-pushed the fix_2022_Iterable branch from d335c49 to 0fec2c7 Compare October 25, 2024 18:36

AndreasTu changed the title ~~Iterable data providers are not called multiple times for size() anymore~~ Clarified documentation about Iterable data providers and size() calls Oct 25, 2024

AndreasTu requested a review from Vampire October 25, 2024 18:37

aditbhartia reviewed Oct 27, 2024

View reviewed changes

Vampire requested changes Oct 29, 2024

View reviewed changes

AndreasTu force-pushed the fix_2022_Iterable branch from 0fec2c7 to 01c56ba Compare October 30, 2024 08:30

leonard84 reviewed Nov 1, 2024

View reviewed changes

AndreasTu force-pushed the fix_2022_Iterable branch 2 times, most recently from 76d53f9 to 5f496a3 Compare November 3, 2024 17:14

Clarified documentation about Iterable data providers and size call

e026689

Added note to the documentation to clarify the usage of Iterables as data pipes and the used size() method. Fixes spockframework#2022

AndreasTu force-pushed the fix_2022_Iterable branch from 5f496a3 to e026689 Compare November 5, 2024 17:08

AndreasTu requested a review from Vampire November 5, 2024 17:08

Vampire approved these changes Nov 5, 2024

View reviewed changes

AndreasTu merged commit bf771f1 into spockframework:master Nov 5, 2024
25 checks passed

AndreasTu deleted the fix_2022_Iterable branch November 5, 2024 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarified documentation about Iterable data providers and size() calls #2027

Clarified documentation about Iterable data providers and size() calls #2027

AndreasTu commented Oct 18, 2024 •

edited

Loading

codecov bot commented Oct 18, 2024 •

edited

Loading

Vampire left a comment

AndreasTu commented Oct 18, 2024

Vampire commented Oct 18, 2024

aditbhartia Oct 27, 2024

Vampire Oct 29, 2024

Vampire Oct 29, 2024

Vampire left a comment

Clarified documentation about Iterable data providers and size() calls #2027

Clarified documentation about Iterable data providers and size() calls #2027

Conversation

AndreasTu commented Oct 18, 2024 • edited Loading

codecov bot commented Oct 18, 2024 • edited Loading

Codecov Report

Vampire left a comment

Choose a reason for hiding this comment

AndreasTu commented Oct 18, 2024

Vampire commented Oct 18, 2024

aditbhartia Oct 27, 2024

Choose a reason for hiding this comment

Vampire Oct 29, 2024

Choose a reason for hiding this comment

Vampire Oct 29, 2024

Choose a reason for hiding this comment

Vampire left a comment

Choose a reason for hiding this comment

AndreasTu commented Oct 18, 2024 •

edited

Loading

codecov bot commented Oct 18, 2024 •

edited

Loading