- Improve batch download retries in case of partial downloads.
- Improve batch download retrying in case of temporary 404 errors
- Remove
aws_profile
from storage manager schema. - Remove
numpy<2
restriction.
BatchDownloadPipeline
retries if the first connection to batch-id specific endpoints fails with a 404.
- Minor changes to documentation.
- Improved test utilities for vector files.
BatchDownloadPipeline
now has an option to automatically retry the download if the status isPARTIAL
.- Added a new example that showcases how to train a LightGBM model with
eo-grow
.
- When a Pipeline raises an error it now saves the stack-trace to the
failure.log
file in the logs folder.
- Pipelines that are run as part of a pipeline-chain execution will now no longer be retried by ray in the case when an exception occurs.
- Parsing time ranges now has support for more formats.
- Parameter
raise_if_failed
renamed toraise_on_failure
and is now enabled by default. - Numpy version restricted in anticipation of numpy 2.0 release.
- Pipelines now have an additional parameter
raise_if_failed
to raise an error if the pipeline failed.
- Fix bug with versions of
sentinelhub-py >= 3.10.0
due to bad version string comparison. - Adjust rounding of statistics for vector data.
- Fix pipeline-chain execution when using CLI
- Fixed
eogrow-validate
command when validating pipeline chains that use variables. - Restricted version of
typing_extensions
With this release we push eo-grow
towards a more ray
centered execution model.
- The local EOExecutor models with multiprocessing/multithreading have been removed. (Most) pipelines no longer have the
use_ray
andworkers
parameters. In order to run instances locally one has to set up a local cluster (viaray start --head
). We included adebug
parameter that usesEOExecutor
instead ofRayExecutor
so that IDE breakpoints work in most pipelines. - Pipeline chain configs have been adjusted. The user can now specify what kind of resources the main pipeline process would require. This also allows one to run pipelines entirely on worker instances.
- The
ray_worker_type
field was replaced withworker_resources
that allows for precise resource request specifications. - Fixed a but where CLI variables were not applied for config chains.
- Removed
TestPipeline
and theeogrow-test
command. - Some
ValueError
exceptions were changed toTypeError
.
- Pipelines can request specific type of worker when run on a ray cluster with the
ray_worker_type
field. - Area managers now generate the patch lists more efficiently.
- Pipelines have option to fail when temporally-ill defined EOPatches are encountered with the
raise_on_temporal_mismatch
flag.
- Fixed a bug in
BatchDownloadPipeline
where the evalscript was not read correctly.
- Pipelines can now save EOPatches in Zarr format
- Testing utilities now also compare vector-based files. Numerical precision of comparison was adjusted.
- Evalscripts are now read from storage. Removed import-path capabilities of config language.
- Adjusted to use
eo-learn 1.5.0
compression
parameters were removed since they are redundant- Removed interpolation from
eogrow.pipelines.features
. LinearFunctionTask
moved toeogrow.tasks.common
fromeo-learn
- many adjustments due to parser changes
- In pipeline configs dictionary keys can now also contain variables.
- Default resizing backend changed to cv2 (to comply with changes in eo-learn).
- Merging timestamps of samples is no longer an option in the sample-merging pipeline.
- Pipelines using a Ray cluster now add the cluster configuration file to the logs folder.
- The CLI command
eogrow-ray
no longer supports--screen
and--stop
commands. - Changelog now also stored in the
CHANGELOG.md
file. - Improved test-data generating pipeline.
- Switched from
flake8
andisort
toruff
. - Various minor improvements.
- Fix bug in
LoggingManager.Schema
whereTuple[str]
was used instead ofTuple[str, ...]
for certain fields, preventing parsing of correct configurations.
- (code-breaking) Simplified
RasterizePipeline
and improve rasterization of temporal vectors. - (code-breaking) Area managers no longer offer AOI modification in the
area
parameter. It has been replaced with a simplerfilename
field. We added a rerouting parser, so old configs should work for a while longer. - (code-breaking) Separated machine learning requirements to
ML
extra that you can install viapip install eogrow[ML]
. These packages are only necessary for sampling, training, and prediction pipelines. - Added
VectorImportPipeline
for adding vector features to EOPatches. - Improved
ExportMapsPipeline
when working with large amounts of files, contributed by @aashishd. - Config files are now uploaded to the cluster before being executed. This prevents issues with commands failing on very large configs.
- Added
restrict_types
validator that detects incompatibleFeatureType
inputs for fields of typeFeature
. - Added
ensure_storage_key_presence
validator, which checks that the specified storage key is defined in the storage manager. Typos in storage keys will now be detected at validation. - Storage managers now support a
filesystem_kwargs
parameter. - Fixed bug where area managers would not filter the grid correctly if the grid was read from the cache.
- Logs to stdout are now colored and contain timestamps.
- Logging configs can now use
"..."
value to reference default packages for fields such aspipeline_ignore_packages
. - Pipelines can now be given custom names, which makes it easier to identify certain pipelines when searching for logs or when running them in config chains.
- Switched to a
pyproject.toml
based installation. - Added new sections to documentation of the high level overview and a collection of commonly used patterns.
- Improved testing tools.
- Various minor improvements.
- (code-breaking) Large changes to area managers. See PR #168
- EOPatch manager functionality was merged to area managers. EOPatch managers were removed.
- Changes to area manager Schemas.
- Changes to area manager interface. Check documentations for all the changes.
- Adjustments to Pipeline interface. See PR #168 for how most pipelines need to be adjusted.
- Improved filtration via list of EOPatch names.
- (code-breaking) Added
ZipMapPipeline
which replacesMappingPipeline
. - (code-breaking) Added
SplitGridPipeline
which replacesSwitchGridsPipeline
. - (code-breaking) Adjusted resize parameters in
ImportTiffPipeline
according to changes inSpatialResizeTask
in neweo-learn
version. - Fixed issue with label encoder in prediction pipeline. Contributed by @ashishdhiman-tomtom
- Moved types to
eogrow.types
and deprecateeogrow.utils.types
. RemovePath
type alias. - Added support for EOPatch names when using the
-t
flag.
- Added
ImportTiffPipeline
for importing a tiff file into EOPatches. ExportMapsPipeline
now runs in parallel (single-machine only).- Fixed issue where
ExportMapsPipeline
consumed increasing amounts of storage space. - Area and eopatch managers for batch grids now warn the user if not linked correctly.
- Added
pyogrio
as a possiblegeopandas
backend for IO (experimental). - Add support for
geopandas
version 0.12. - Improve types after
mypy
version 0.990. - Removed
utils.enum
and old style of templating due to non-use. - Other various improvements and clean-ups.
- Greatly improved
ExportMapsPipeline
andIngestByocTilesPipeline
, which are now also able to export and ingest temporal BYOC collections - Improved test suite for exporting maps and ingesting BYOC collections
- Fixed code according to newly exposed
eolearn.core
types - Fixed broken github links in documentation
- Improvements to CI, added pre-commit hooks to the repository
- BYOC ingestion pipeline is better at handling CRS objects
- Becaue
pydantic
now type-checks default factories two custom factorieslist_factory
anddict_factory
have been added, because using justlist
currently clashes with fields of kindList[int]
.
- Added
IngestByocTiles
pipeline, which creates or updates a BYOC collection from maps exported viaExportMapsPipeline
. - Greatly improved
DataCollection
parser, which can now parseDataCollectionSchema
objects instead of just names. - Added tests for validator utility functions.
- New general validators
ensure_defined_together
andensure_exactly_one_defined
for verifying optional parameters. - Documentation of
Schema
objects is now much more verbose. ExportMapsPipeline
now saves maps into subfolders (per UTM zone).- Fixed issue where
ExportMapPipeline
ignoreddtype
andnodata
when merging. - Improved handling of
aws_profile
parameter in storage managers. RasterizePipeline
now has an additionalraster_shape
parameter.
- Fixed a bug in
BatchToEOPatchPipeline
where temporal dimension of some imported features could be reversed. Memory-optimization functionalities have been reverted. - Improved the way
filesystem
object is passed to EOTasks in EOWorkflows. These changes are a consequence of changes ineo-learn==1.2.0
. - Added support for
aws_acl
parameter intoStorage
schema. - Download pipelines now support an optional
size
parameter. - Official support for Python
3.10
. - Large changes in testing utilities. Statistics produced by
ContentTester
have been changed and are now more descriptive. - Improvements in code-style checkers and CI.
- Support session sharing in download pipelines.
- Improved
BatchAreaManager
bounding boxes. - Improve memory footprint of various pipelines.
- Disabled
skip_existing
andeopatch_list
at validation time for pipelines that do not support filtration. - Support for rasterization of temporal vector features from files.
- Docs are now built automatically and the type annotations are included in parameter descriptions, resulting in better readability.
- Many minor improvements and fixes in code, tests, and documentation.
-
Large changes in config objects and schemas:
- replaced
Config
object with config utility functionscollect_configs_from_path
,interpret_config_from_dict
, andinterpret_config_from_path
, - pipeline and manager config objects are now
pydantic
schema classes, which are fully typed objects, - removed
${env:variable}
from the config language.
- replaced
-
Changes in area managers:
- added
AreaManager.cache_grid
method, - (code-breaking)improved functionalities of
BatchAreaManger
, instead oftile_buffer
it now usestile_buffer_x
andtile_buffer_y
config parameters, - (code-breaking) improved
UtmZoneAreaManager
, replacedpatch_buffer
config parameter withpatch_buffer_x
andpatch_buffer_y
which now work with absolute instead of relative buffers , - implemented grid transformation methods for
UtmZoneAreaManager
andBatchAreaManager
.
- added
-
Other core improvements:
- added
EOGrowObject.from_raw_config
andEOGrowObject.from_path
methods, - fixed an issue in
EOPatchManager
, - improvements of pipeline logging, logging handlers, and filters.
- added
-
Pipeline improvements:
- Implemented
SwitchGridPipeline
for converting data between tiling grids. - Large updates of
BatchDownloadPipeline
with restructured config schema and additional functionalities. BatchToEOPatchPipeline
now works withinput_folder_key
andoutput_folder_key
instead offolder_key
and has an option not to delete input data. A few issues in the pipeline were fixed and unit tests were added.- Minor improvements of config parameters in
MergeSamplesPipeline
and prediction pipelines. - Implemented
DummyDataPipeline
for generating data for unit tests.
- Implemented
-
New tasks:
SpatialJoinTask
andSpatialSliceTask
for spatial operations on EOPatches,DummyRasterFeatureTask
andDummyTimestampFeatureTask
for creating EOPatches with dummy data.
-
Updates in utilities:
- added utilities for spatial operations and grid transformations,
- implemented
eogrow.utils.fs.LocalFolder
abstraction, - renamed
get_patches_without_all_features
intoget_patches_with_missing_features
fromeogrow.utils.filter
, - (code-breaking) updated
eogrow.utils.testing.run_and_test_pipeline
to work with a list of pipeline configs.
-
Created the
eo-grow
package documentation page. -
eo-grow
is now a fully typed package. Added mypy and isort code checking to CI. -
Updated tutorial notebooks to work with the latest code.
-
Many minor improvements and fixes in code, tests, and documentation.
First release of the eo-grow
package.