Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update URLS to download Prithvi Segmentation Model weights and config #11

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

BioGeek
Copy link
Collaborator

@BioGeek BioGeek commented Dec 17, 2024

The name of the config and the weights of the Prithvi model on Huggingface have changed.

This PR updates the URLs. The commit history also shows that the weights have changed to a new format and the inference code has been updated. I haven't tested yet how this impacts the InstaGeo code.

@BioGeek BioGeek requested a review from Alikerin December 17, 2024 07:02
@Alikerin
Copy link
Collaborator

Thank you @BioGeek for the contribution.
@ZeLynxy can you please review this changes?

@Alikerin Alikerin requested a review from ZeLynxy December 18, 2024 06:53
@BioGeek
Copy link
Collaborator Author

BioGeek commented Dec 18, 2024

I now get the following output when I run pytest .:

============================================================================== test session starts ==============================================================================
platform linux -- Python 3.12.7, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/j-vangoey/code/InstaGeo-E2E-Geospatial-ML
configfile: pyproject.toml
plugins: cov-6.0.0, anyio-4.7.0, hydra-core-1.3.2
collected 48 items                                                                                                                                                              

tests/apps_tests/test_viz.py ...                                                                                                                                          [  6%]
tests/data_tests/test_chip_creator.py .F.F.                                                                                                                               [ 16%]
tests/data_tests/test_create_chips.py .                                                                                                                                   [ 18%]
tests/data_tests/test_geo_utils.py ..........                                                                                                                             [ 39%]
tests/data_tests/test_hls_utils.py ......FF..                                                                                                                             [ 60%]
tests/model_tests/test_dataloader.py ................                                                                                                                     [ 93%]
tests/model_tests/test_model.py ..                                                                                                                                        [ 97%]
tests/model_tests/test_sliding_window_inference.py .                                                                                                                      [100%]

=================================================================================== FAILURES ====================================================================================
_______________________________________________________________________________ test_chip_creator _______________________________________________________________________________

setup_and_teardown_output_dir = None

    @pytest.mark.auth
    def test_chip_creator(setup_and_teardown_output_dir):
        output_directory = "/tmp/csv_chip_creator"
        argv = [
            "chip_creator",
            "--dataframe_path",
            os.path.join(os.path.dirname(test_root), "data/test_breeding_data.csv"),
            "--output_directory",
            output_directory,
            "--min_count",
            "4",
            "--chip_size",
            "512",
            "--no_data_value",
            "-1",
            "--temporal_tolerance",
            "1",
            "--temporal_step",
            "30",
            "--num_steps",
            "1",
        ]
        FLAGS(argv)
        chip_creator.main("None")
        chips = os.listdir(os.path.join(output_directory, "chips"))
        seg_maps = os.listdir(os.path.join(output_directory, "seg_maps"))
        assert len(chips) == len(seg_maps)
>       assert len(chips) == 3
E       assert 0 == 3
E        +  where 0 = len([])

tests/data_tests/test_chip_creator.py:85: AssertionError
----------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------
Granules found: 98
Enter your Earthdata Login username: 'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
----------------------------------------------------------------------------- Captured stderr call ------------------------------------------------------------------------------
Processing HLS Dataset: 100%|██████████| 3/3 [00:00<00:00, 833.31it/s]
------------------------------------------------------------------------------- Captured log call -------------------------------------------------------------------------------
INFO     absl:chip_creator.py:313 Creating HLS dataset JSON.
INFO     absl:chip_creator.py:314 Retrieving HLS tile ID for each observation.
INFO     absl:chip_creator.py:321 Retrieving HLS tiles that will be downloaded.
INFO     absl:chip_creator.py:342 Downloading HLS Tiles
WARNING  absl:hls_utils.py:292 Couldn't download the following granules after 3 retries:
{'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022335T073159.v2.0/HLS.S30.T38PLB.2022335T073159.v2.0.B03.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022305T073009.v2.0/HLS.S30.T38PLB.2022305T073009.v2.0.Fmask.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022335T073159.v2.0/HLS.S30.T38PLB.2022335T073159.v2.0.B02.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022245T072619.v2.0/HLS.S30.T38PLB.2022245T072619.v2.0.B11.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022305T073009.v2.0/HLS.S30.T38PLB.2022305T073009.v2.0.B02.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022245T072619.v2.0/HLS.S30.T38PLB.2022245T072619.v2.0.B12.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022335T073159.v2.0/HLS.S30.T38PLB.2022335T073159.v2.0.Fmask.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022305T073009.v2.0/HLS.S30.T38PLB.2022305T073009.v2.0.B03.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022335T073159.v2.0/HLS.S30.T38PLB.2022335T073159.v2.0.B11.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022305T073009.v2.0/HLS.S30.T38PLB.2022305T073009.v2.0.B8A.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022305T073009.v2.0/HLS.S30.T38PLB.2022305T073009.v2.0.B12.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022305T073009.v2.0/HLS.S30.T38PLB.2022305T073009.v2.0.B04.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022335T073159.v2.0/HLS.S30.T38PLB.2022335T073159.v2.0.B04.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022245T072619.v2.0/HLS.S30.T38PLB.2022245T072619.v2.0.B04.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022245T072619.v2.0/HLS.S30.T38PLB.2022245T072619.v2.0.B03.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022245T072619.v2.0/HLS.S30.T38PLB.2022245T072619.v2.0.Fmask.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022335T073159.v2.0/HLS.S30.T38PLB.2022335T073159.v2.0.B12.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022305T073009.v2.0/HLS.S30.T38PLB.2022305T073009.v2.0.B11.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022245T072619.v2.0/HLS.S30.T38PLB.2022245T072619.v2.0.B02.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022245T072619.v2.0/HLS.S30.T38PLB.2022245T072619.v2.0.B8A.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSS30.020/HLS.S30.T38PLB.2022335T073159.v2.0/HLS.S30.T38PLB.2022335T073159.v2.0.B8A.tif'}
INFO     absl:chip_creator.py:349 Creating Chips and Segmentation Maps
ERROR    absl:chip_creator.py:373 Error /tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.B02.tif: No such file or directory when reading dataset containing: {'tiles': {'B02_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.B02.tif', 'B03_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.B03.tif', 'B04_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.B04.tif', 'B8A_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.B8A.tif', 'B11_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.B11.tif', 'B12_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.B12.tif'}, 'fmasks': {'Fmask_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022245T072619.v2.0.Fmask.tif'}}
ERROR    absl:chip_creator.py:373 Error /tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.B02.tif: No such file or directory when reading dataset containing: {'tiles': {'B02_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.B02.tif', 'B03_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.B03.tif', 'B04_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.B04.tif', 'B8A_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.B8A.tif', 'B11_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.B11.tif', 'B12_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.B12.tif'}, 'fmasks': {'Fmask_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022335T073159.v2.0.Fmask.tif'}}
ERROR    absl:chip_creator.py:373 Error /tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.B02.tif: No such file or directory when reading dataset containing: {'tiles': {'B02_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.B02.tif', 'B03_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.B03.tif', 'B04_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.B04.tif', 'B8A_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.B8A.tif', 'B11_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.B11.tif', 'B12_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.B12.tif'}, 'fmasks': {'Fmask_0': '/tmp/csv_chip_creator/hls_tiles/HLS.S30.T38PLB.2022305T073009.v2.0.Fmask.tif'}}
INFO     absl:chip_creator.py:376 Saving dataframe of chips and segmentation maps.
________________________________________________________________________ test_missing_flags_raises_error ________________________________________________________________________

    def test_missing_flags_raises_error():
        """Test missing flags."""
        FLAGS([__file__])
>       with pytest.raises(app.UsageError) as excinfo:
E       Failed: DID NOT RAISE <class 'absl.app.UsageError'>

tests/data_tests/test_chip_creator.py:138: Failed
____________________________________________________________________________ test_download_hls_tile _____________________________________________________________________________

    @pytest.mark.auth
    def test_download_hls_tile():
        urls = [
            "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T38PMB.2022139T071922.v2.0/HLS.L30.T38PMB.2022139T071922.v2.0.B01.tif"  # noqa
        ]
>       parallel_download(urls, outdir="/tmp")

tests/data_tests/test_hls_utils.py:366: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

urls = ['https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T38PMB.2022139T071922.v2.0/HLS.L30.T38PMB.2022139T071922.v2.0.B01.tif']
outdir = '/tmp', max_retries = 3

    def parallel_download(urls: set[str], outdir: str, max_retries: int = 3) -> None:
        """Parallel Download.
    
        Wraps `download_tile` with multiprocessing.Pool for downloading multiple tiles in
        parallel.
    
        Args:
            urls: Tile urls to download.
            outdir: Directory to save downloaded tiles.
            max_retries: Number of times to retry downloading all tiles.
    
        Returns:
            None
        """
        num_cpus = cpu_count()
        earthaccess.login(persist=True)
        retries = 0
        complete = False
        while retries <= max_retries:
            temp_urls = [
                url
                for url in urls
                if not os.path.exists(os.path.join(outdir, url.split("/")[-1]))
            ]
            if not temp_urls:
                complete = True
                break
            earthaccess.download(temp_urls, local_path=outdir, threads=num_cpus)
            for filename in os.listdir(outdir):
                file_path = os.path.join(outdir, filename)
                if os.path.isfile(file_path):
                    file_size = os.path.getsize(file_path)
                    if file_size < 1024:
>                       os.remove(file_path)
E                       PermissionError: [Errno 1] Operation not permitted: '/tmp/jumpcloud-agent-updater.lock'

.venv/lib/python3.12/site-packages/instageo/data/hls_utils.py:287: PermissionError
----------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------
Enter your Earthdata Login username: 'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
_______________________________________________________________________ test_download_hls_tile_with_retry _______________________________________________________________________

setup_and_teardown_output_dir = None

    @pytest.mark.auth
    def test_download_hls_tile_with_retry(setup_and_teardown_output_dir):
        outdir = "/tmp/test_hls"
        open(
            os.path.join(outdir, "HLS.L30.T38PMB.2022139T071922.v2.0.B02.tif"), "w"
        ).close()
        urls = {
            "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T38PMB.2022139T071922.v2.0/HLS.L30.T38PMB.2022139T071922.v2.0.B03.tif",  # noqa
            "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T38PMB.2022139T071922.v2.0/HLS.L30.T38PMB.2022139T071922.v2.0.B02.tif",  # noqa
        }
        parallel_download(urls, outdir=outdir)
        out_filename = os.path.join(outdir, "HLS.L30.T38PMB.2022139T071922.v2.0.B02.tif")
>       assert os.path.exists(out_filename)
E       AssertionError: assert False
E        +  where False = <function exists at 0x7614f2bef9c0>('/tmp/test_hls/HLS.L30.T38PMB.2022139T071922.v2.0.B02.tif')
E        +    where <function exists at 0x7614f2bef9c0> = <module 'posixpath' (frozen)>.exists
E        +      where <module 'posixpath' (frozen)> = os.path

tests/data_tests/test_hls_utils.py:385: AssertionError
----------------------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------------------
Enter your Earthdata Login username: 'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
'NoneType' object has no attribute 'get'
You must call earthaccess.login() before you can download data
------------------------------------------------------------------------------- Captured log call -------------------------------------------------------------------------------
WARNING  absl:hls_utils.py:292 Couldn't download the following granules after 3 retries:
{'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T38PMB.2022139T071922.v2.0/HLS.L30.T38PMB.2022139T071922.v2.0.B03.tif', 'https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T38PMB.2022139T071922.v2.0/HLS.L30.T38PMB.2022139T071922.v2.0.B02.tif'}
=============================================================================== warnings summary ================================================================================
.venv/lib/python3.12/site-packages/earthaccess/formatters.py:4
  /home/j-vangoey/code/InstaGeo-E2E-Geospatial-ML/.venv/lib/python3.12/site-packages/earthaccess/formatters.py:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

tests/data_tests/test_chip_creator.py::test_get_chip_coords
tests/data_tests/test_create_chips.py::test_create_chips
tests/data_tests/test_create_chips.py::test_create_chips
tests/data_tests/test_create_chips.py::test_create_chips
tests/data_tests/test_create_chips.py::test_create_chips
tests/data_tests/test_create_chips.py::test_create_chips
  /home/j-vangoey/code/InstaGeo-E2E-Geospatial-ML/.venv/lib/python3.12/site-packages/pandas/core/frame.py:717: DeprecationWarning: Passing a BlockManager to GeoDataFrame is deprecated and will raise in a future version. Use public APIs instead.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================ short test summary info ============================================================================
FAILED tests/data_tests/test_chip_creator.py::test_chip_creator - assert 0 == 3
FAILED tests/data_tests/test_chip_creator.py::test_missing_flags_raises_error - Failed: DID NOT RAISE <class 'absl.app.UsageError'>
FAILED tests/data_tests/test_hls_utils.py::test_download_hls_tile - PermissionError: [Errno 1] Operation not permitted: '/tmp/jumpcloud-agent-updater.lock'
FAILED tests/data_tests/test_hls_utils.py::test_download_hls_tile_with_retry - AssertionError: assert False
=================================================================== 4 failed, 44 passed, 7 warnings in 35.35s ===================================================================

Please advise how to setup login credentials for the tests so that I can avoid Enter your Earthdata Login username: 'NoneType' object has no attribute 'get'.

The logic to cleanup files in parallel_download can also be improved, because it tries to remove a file for which it has no access: PermissionError: [Errno 1] Operation not permitted: '/tmp/jumpcloud-agent-updater.lock'

@Alikerin
Copy link
Collaborator

@BioGeek see this readme on how to setup authentication

@Alikerin
Copy link
Collaborator

@ZeLynxy let's modify the outdir in test_download_hls_tile to be something like /tmp/test_download to avoid removing the entire /tmp diretory

@BioGeek
Copy link
Collaborator Author

BioGeek commented Dec 18, 2024

When creating the /.netrc file, the test_chip_creator and test_download_hls_tile_with_retry tests now succeed. The remaining tests are:

FAILED tests/data_tests/test_chip_creator.py::test_missing_flags_raises_error - Failed: DID NOT RAISE <class 'absl.app.UsageError'>
FAILED tests/data_tests/test_hls_utils.py::test_download_hls_tile - PermissionError: [Errno 1] Operation not permitted: '/tmp/jumpcloud-agent-updater.lock'

@Alikerin
Copy link
Collaborator

Alikerin commented Dec 18, 2024 via email

@Alikerin
Copy link
Collaborator

@BioGeek Can you replace the failing tests with the updated version we have in a yet to merge PR?

def test_missing_flags_raises_error():
    """Test missing flags."""
    FLAGS.dataframe_path = None
    FLAGS.output_directory = None
    FLAGS(["test"])

    with pytest.raises(app.UsageError) as excinfo:
        check_required_flags()
    assert "Flag --dataframe_path is required" in str(
        excinfo.value
    ) or "Flag --output_directory is required" in str(
        excinfo.value
    ), "Expected UsageError with a message about missing required flags"
    
@pytest.mark.auth
def test_download_hls_tile(setup_and_teardown_output_dir):
    outdir = "/tmp/test_hls"
    urls = [
        "https://data.lpdaac.earthdatacloud.nasa.gov/lp-prod-protected/HLSL30.020/HLS.L30.T38PMB.2022139T071922.v2.0/HLS.L30.T38PMB.2022139T071922.v2.0.B01.tif"  # noqa
    ]
    parallel_download(urls, outdir=outdir)
    out_filename = "/tmp/test_hls/HLS.L30.T38PMB.2022139T071922.v2.0.B01.tif"  # noqa
    assert os.path.exists(out_filename)
    src = rasterio.open(out_filename)
    assert isinstance(src.crs, CRS)

@BioGeek
Copy link
Collaborator Author

BioGeek commented Dec 18, 2024

@Alikerin I created #12 with the suggested updates to the tests. All tests are passing now.

@Alikerin Alikerin merged commit c2c8630 into instadeepai:main Jan 9, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants