Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarified documentation about Iterable data providers and size() calls #2027

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

AndreasTu
Copy link
Member

@AndreasTu AndreasTu commented Oct 18, 2024

Added note to the documentation to clarify the usage of Iterables
as data pipes and the used size() method.

Fixes #2022

Copy link

codecov bot commented Oct 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.86%. Comparing base (2c7db77) to head (e026689).
Report is 157 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2027      +/-   ##
============================================
+ Coverage     80.44%   81.86%   +1.41%     
- Complexity     4337     4609     +272     
============================================
  Files           441      448       +7     
  Lines         13534    14445     +911     
  Branches       1707     1829     +122     
============================================
+ Hits          10888    11826     +938     
+ Misses         2008     1942      -66     
- Partials        638      677      +39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Vampire Vampire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do this or anything similar.
If it is a Collection it could also be expensive to calculate.
For Iterator it just is not done as Iterators are one-time consumable only.
For all others you can consume it multiple times and users should adjust to that imho.
They can anytime give an interator in instead, or make sure the iterable has some built-in cache to not do the expensive operation multiple times.

@AndreasTu
Copy link
Member Author

@Vampire But why should we call a size() method on an Iterable, which is not part of the contract for Iterable.
On Collections there is the size() method contract, which we can argument "hey then apply a cache", but the call to DefaultGroovyMethods.size(Iterable) is IMHO very surprising, which leads to 3 iterations of the iterable, where the user did not define it.

I think the size() method on an iterable is a Groovy idiosyncrasy, which we should not force onto users.

@Vampire
Copy link
Member

Vampire commented Oct 18, 2024

But why should we call a size() method on an Iterable, which is not part of the contract for Iterable.

Spock specs are Groovy code, not Java code.
In Groovy size() is part of the contract of Iterable: https://docs.groovy-lang.org/latest/html/groovy-jdk/java/lang/Iterable.html#size()

I think the size() method on an iterable is a Groovy idiosyncrasy, which we should not force onto users.

I'm totally on the opposite actually.
We are in Groovy environment, so we should follow Groovy semantics.
And for Groovy the method is defined and thus executed.

@leonard84 what is your PoV?

@AndreasTu AndreasTu changed the title Iterable data providers are not called multiple times for size() anymore Clarified documentation about Iterable data providers and size() calls Oct 25, 2024
@@ -252,6 +252,11 @@ used as a data provider. This includes objects of type `Collection`, `String`, `
they can fetch data from external sources like text files, databases and spreadsheets, or generate data randomly.
Data providers are queried for their next value only when needed (before the next iteration).

NOTE: For performance reasons it is helpful to use `Iterable.iterator()` instead of the `Iterable` as data pipe,
if a custom `size()` method is not available or the calculation of the `Iterable` is expensive.
The Groovy `Iterable` contract contains a `size()` method, which iterates the `Iterable` to calculate the size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would specifying that iteration over the Iterable may be done multiple times before execution be a more accurate way of expressing the internal behavior? In my opinion, a reader could assume that the iteration is only done once from reading this. The implication that the next value is fetched unnecessarily also seems to contradict the statement in 253.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would specifying that iteration over the Iterable may be done multiple times before execution be a more accurate way of expressing the internal behavior?

Maybe in some cases, but not generally.
Also the user documentation should not document the internal behavior.
And actually, it is not even Spock behavior.
Spock just calls the size() method whatever that may be.
It is Groovy that - if no other size() method is present - for an Iterable calls iterator() and iterates over it to determine the size().
And this could also any time change in any Groovy version theoretically.
So this should imho not be documented in the Spock doc.

In my opinion, a reader could assume that the iteration is only done once from reading this. The implication that the next value is fetched unnecessarily also seems to contradict the statement in 253.

Yes and no.
We indeed only fetch the next value when it is needed as documented.
As said above, it is the Groovy implementation of size() that iterates over an Iterable to determine its size.
All we can say imho is, that we call size() if it is not an Iterator and that it should be efficient.
Then the user can take care to make it efficient or if not possible provide an Iterator.

In the case of an external DB, you could e. g. also provide an Iterable that has a size() method that on first call does a select count query and persist that result or similar.

docs/data_driven_testing.adoc Outdated Show resolved Hide resolved
@@ -252,6 +252,11 @@ used as a data provider. This includes objects of type `Collection`, `String`, `
they can fetch data from external sources like text files, databases and spreadsheets, or generate data randomly.
Data providers are queried for their next value only when needed (before the next iteration).

NOTE: For performance reasons it is helpful to use `Iterable.iterator()` instead of the `Iterable` as data pipe,
if a custom `size()` method is not available or the calculation of the `Iterable` is expensive.
The Groovy `Iterable` contract contains a `size()` method, which iterates the `Iterable` to calculate the size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would specifying that iteration over the Iterable may be done multiple times before execution be a more accurate way of expressing the internal behavior?

Maybe in some cases, but not generally.
Also the user documentation should not document the internal behavior.
And actually, it is not even Spock behavior.
Spock just calls the size() method whatever that may be.
It is Groovy that - if no other size() method is present - for an Iterable calls iterator() and iterates over it to determine the size().
And this could also any time change in any Groovy version theoretically.
So this should imho not be documented in the Spock doc.

In my opinion, a reader could assume that the iteration is only done once from reading this. The implication that the next value is fetched unnecessarily also seems to contradict the statement in 253.

Yes and no.
We indeed only fetch the next value when it is needed as documented.
As said above, it is the Groovy implementation of size() that iterates over an Iterable to determine its size.
All we can say imho is, that we call size() if it is not an Iterator and that it should be efficient.
Then the user can take care to make it efficient or if not possible provide an Iterator.

In the case of an external DB, you could e. g. also provide an Iterable that has a size() method that on first call does a select count query and persist that result or similar.

@AndreasTu AndreasTu force-pushed the fix_2022_Iterable branch 2 times, most recently from 76d53f9 to 5f496a3 Compare November 3, 2024 17:14
Added note to the documentation to clarify the usage of Iterables
as data pipes and the used size() method.

Fixes spockframework#2022
Copy link
Member

@Vampire Vampire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thx

@AndreasTu AndreasTu merged commit bf771f1 into spockframework:master Nov 5, 2024
25 checks passed
@AndreasTu AndreasTu deleted the fix_2022_Iterable branch November 5, 2024 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Passing in a custom Iterable as a data provider causes Spock to iterate over each item multiple times
4 participants