Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: build backend and frontend check #47

Merged
merged 6 commits into from
Jan 10, 2025
Merged

feat: build backend and frontend check #47

merged 6 commits into from
Jan 10, 2025

Conversation

12rambau
Copy link
Member

@12rambau 12rambau commented Jan 6, 2025

Fix #12

This PR is implementing a more sophisticated testing mechanism to avoid executing the GEE API all the time.
As mentioned by @schwehr and @naschmitz, serialize() is the best way to get a robust image of the call to the API.

The workflow implemented in list_regression.py is the following:

  • check if the API is the same as last time (prefixed with backend) without raising anything. If no failure is raised then it stops there (no real API request)
  • If something fails, we send the query and compare the actual result. whatever the result the serialized version of the call is regenerated.

The advantage of this method is that it bypass the actual result if the query is always the same (trusting the Google team) but if it differs we actually verify that the result is maintained. The idea being that the user of the lib should not care about the serialized version of the query, reason why it's always run in a supress context manager.

Happy to read your comments.

TODO:

  • do the same for dict
  • do the same fore fc
  • do the same for images
  • update documentation

@12rambau 12rambau requested a review from fitoprincipe January 8, 2025 06:55
Copy link
Member

@fitoprincipe fitoprincipe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like the approach, it's basically what I was envisioning. However, my main concern is that you'll need to repeat this operation in every (ee) class. Why not creating an intermediate class that will handle this repetitive task??

A tentative mock could be

class EEDataRegressionFixture(DataRegressionFixture):
        def check(
            self,
            data_object: ee.ComputedObject,
            basename: Optional[str] = None,
            fullpath: Optional[os.PathLike] = None,
            **kwargs
        ):
        """Check the given ee object against a previously recorded version, or generate a new file.

        Parameters:
            data_list: The list to check.
            basename: The basename of the file to test/record. If not given the name of the test is used.
            fullpath: complete path to use as a reference file. This option will ignore ``datadir`` fixture when reading *expected* files but will still use it to write *obtained* files. Useful if a reference file is located in the session data dir for example.
            kwargs: params to pass to subclass check.
        """
        # build the different filename to be consistent between our 3 checks
        name = build_fullpath(
            self.original_datadir, self.request, "", basename, fullpath, self.with_test_class_names
        )
        serialized_name = name.with_stem(f"backend_{name.name}").with_suffix(".yml")
        data_name = name.with_stem(f"frontend_{name.name}").with_suffix(".yml")

        # check the previously registered serialized call from GEE. If it matches the current call,
        # we don't need to check the data
        serialized = data_object.serialize()
        with suppress(BaseException):
            super().check(serialized, fullpath=serialized_name)
            return

        # if it needs to be checked, we need to round the float values to the same precision as the
        # reference file
        data = self.check_backend(data_object, **kwargs)
        try:
            super().check(data, fullpath=data_name)

            # IF we are here it means the data has been modified so we edit the API call accordingly
            # to make sure next run will not be forced to call the API for a response.
            serialized_name.unlink(missing_ok=True)
            with suppress(BaseException):
                super().check(serialized, fullpath=serialized_name)

        except BaseException as e:
            raise e

class ListFixture(EEDataRegressionFixture):
    """Fixture for regression testing of :py:class:`ee.List`."""
    def check_backend(
        self,
        data_list: ee.List,
        prescision: int = 6,
    ):
        return round_data(data_list.getInfo(), prescision) 

@12rambau
Copy link
Member Author

12rambau commented Jan 8, 2025

Because every ee class is performing a different type of check and thus requires a slightly diffrent handle in between the 2 serialize checks.
ee.Dictionary ee.List need to be rounded, ee.image is a ImageFixture (so not the same) and need to be exported with computePixel instead of getInfo and as ee.featureCollection is including a geometry, it needs to pass trough some geopandas magic to maintain topological consistency when reducing the number of digits.

@fitoprincipe
Copy link
Member

Yes, I got that, but that's why I am proposing the check_backend method, that will handle exactly that and will be unique for each ee class

@12rambau
Copy link
Member Author

12rambau commented Jan 8, 2025

This PR is blocked by ESSS/pytest-regressions#184

@12rambau 12rambau marked this pull request as ready for review January 10, 2025 16:10
@12rambau 12rambau merged commit aa03ffa into main Jan 10, 2025
12 checks passed
@12rambau 12rambau deleted the serialize branch January 10, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check query before fetching data from server
2 participants