From 2d907bbf5da159a72cd827183838e25072d617ef Mon Sep 17 00:00:00 2001 From: Iffanice Houndayi Date: Mon, 2 Dec 2024 12:04:52 +0200 Subject: [PATCH] feat: adjusted notebook to download original data and use subset for subsequent process --- notebooks/InstaGeo_Demo.ipynb | 1803 +++++++++++++++++---------------- 1 file changed, 920 insertions(+), 883 deletions(-) diff --git a/notebooks/InstaGeo_Demo.ipynb b/notebooks/InstaGeo_Demo.ipynb index fd53505..09df1f3 100644 --- a/notebooks/InstaGeo_Demo.ipynb +++ b/notebooks/InstaGeo_Demo.ipynb @@ -1,886 +1,923 @@ { - "cells": [ - { - "cell_type": "markdown", - "id": "4574f053-60d4-419b-88cc-c58d08a8177c", - "metadata": { - "id": "4574f053-60d4-419b-88cc-c58d08a8177c" - }, - "source": [ - "# InstaGeo Demo\n", - "\n", - "\"Open\n", - "\n", - "Welcome to the InstaGeo demo notebook! This tutorial showcases the capabilities of InstaGeo, an end-to-end package designed for geospatial machine learning with multispectral data.\n", - "\n", - "In this demonstration, we use ground truth geospatial point observations for cropland classification in Rwanda. The notebook will guide you through the process of creating segmentation-like data from these observations, fine-tuning the [Prithvi](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M) model, and finally visualizing the inference results on an interactive map.\n", - "\n", - "By the end of this demo, you will gain hands-on experience with key InstaGeo functionalities and learn how it streamlines geospatial ML workflows from data preparation to model inference." - ] - }, - { - "cell_type": "markdown", - "id": "380ead2a", - "metadata": {}, - "source": [ - "# Install InstaGeo" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e550a300", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "e550a300", - "outputId": "3cbc267b-6c02-4539-fd64-c7b386bc1156", - "tags": [] - }, - "outputs": [], - "source": [ - "repository_url = \"https://github.com/instadeepai/InstaGeo-E2E-Geospatial-ML\"\n", - "\n", - "!git clone {repository_url}" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9f7e762b", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "collapsed": true, - "id": "9f7e762b", - "jupyter": { - "outputs_hidden": true - }, - "outputId": "4372ac59-3158-48dc-f321-fa2a24ee9255", - "tags": [] - }, - "outputs": [], - "source": [ - "%%bash\n", - "cd InstaGeo-E2E-Geospatial-ML\n", - "pip install -e .[all]" - ] - }, - { - "cell_type": "markdown", - "id": "238c78e7-720e-49ed-b5ce-6be6567d2585", - "metadata": { - "id": "238c78e7-720e-49ed-b5ce-6be6567d2585" - }, - "source": [ - "## EarthData Login\n", - "\n", - "InstaGeo currently supports multispectral data from NASA [Harmonized Landsat and Sentinel-2 (HLS)](https://hls.gsfc.nasa.gov/). Accessing HLS data requires an EarthData user account which can be created [here](https://urs.earthdata.nasa.gov/)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "4fc7ff9c", - "metadata": { - "id": "4fc7ff9c", - "tags": [] - }, - "outputs": [], - "source": [ - "from getpass import getpass\n", - "import os" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8b7a5f61", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "8b7a5f61", - "outputId": "78a1c750-3a23-4c0a-f45e-94388625fb8b" - }, - "outputs": [], - "source": [ - "# Enter you EarthData user account credentials\n", - "USERNAME = getpass('Enter your EarthData username: ')\n", - "PASSWORD = getpass('Enter your EarthData password: ')\n", - "\n", - "content = f\"\"\"machine urs.earthdata.nasa.gov login {USERNAME} password {PASSWORD}\"\"\"\n", - "\n", - "with open(os.path.expanduser('~/.netrc'), 'w') as file:\n", - " file.write(content)" - ] - }, - { - "cell_type": "markdown", - "id": "488a15d3-f22e-4b2c-85c0-435c832e708c", - "metadata": { - "id": "488a15d3-f22e-4b2c-85c0-435c832e708c" - }, - "source": [ - "## InstaGeo - Data\n", - "\n", - "With InstaGeo installed and EarthData authentication configured, we are now ready to download and process HLS (Harmonized Landsat and Sentinel) granules using the `InstaGeo-Data` module. This module offers several powerful functionalities for handling geospatial data, including:\n", - "\n", - "- Searching and retrieving metadata for HLS granules\n", - "- Downloading specific spectral bands from HLS granules\n", - "- Generating data chips and corresponding target labels for machine learning tasks\n", - "\n", - "These capabilities streamline the preprocessing of multispectral data, setting the foundation for efficient geospatial model development.\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "38e36f50", - "metadata": { - "id": "38e36f50" - }, - "outputs": [], - "source": [ - "import pandas as pd\n", - "import numpy as np\n", - "from pathlib import Path" - ] - }, - { - "cell_type": "markdown", - "id": "c4d83f72", - "metadata": {}, - "source": [ - "The ground-truth geospatial observations for Rwanda cropland classification used in this notebook were sourced from the following datasets:\n", - "\n", - "- [Rwanda Crop Type Dataset (RTI)](https://beta.source.coop/rti/rwanda-crop-type/)\n", - "- [Rwanda 2019 Crop/Non-Crop Labels (HarvestPortal)](https://data.harvestportal.org/dataset/rwanda-2019-crop-non-crop-labels)\n", - "- [Processed Crop/Non-Crop Labels (HarvestPortal)](https://data.harvestportal.org/dataset/processed-crop-non-crop-labels/resource/037bff27-4f9f-44e5-9eb2-72eff840becb)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "adf67434", - "metadata": { - "id": "adf67434" - }, - "outputs": [], - "source": [ - "df1 = pd.read_csv(\"ed0ab379-a688-4419-ab96-181c726e1b22.csv\")\n", - "df2 = pd.read_csv(\"0cfc1320-f909-4759-90f9-cb5c92ca019e.csv\")\n", - "df3 = pd.read_csv(\"ceo-2019-rwanda-cropland-rcmrd-set-2-sample-data-2021-04-20.csv\")\n", - "\n", - "df = pd.concat([df1, df2, df3])" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f6541bd6", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 363 - }, - "id": "f6541bd6", - "outputId": "0a55e877-f163-4d7d-c479-542315a0cd37" - }, - "outputs": [], - "source": [ - "df = df[['lat', 'lon', 'collection_time', 'Crop/ or not', 'sample_id']]\n", - "df = df.rename({\"lon\": \"x\", \"lat\":\"y\", \"Crop/ or not\":'label', 'collection_time':\"date\"}, axis=1)\n", - "df.head(10)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ea228e76", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ea228e76", - "outputId": "4c5745bf-60d6-455f-9cea-d3c7c8069379" - }, - "outputs": [], - "source": [ - "print(f\"The size of the dataset is: {df.size}\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "e7ac02c8", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 363 - }, - "id": "e7ac02c8", - "outputId": "40da1092-af57-419f-f75f-781ebdfd3f46" - }, - "outputs": [], - "source": [ - "def label_map(x):\n", - " if x == \"Cropland\":\n", - " return 1\n", - " elif x == \"Non-crop\":\n", - " return 0\n", - " else:\n", - " return np.nan\n", - "\n", - "df['date'] = df['date'].map(lambda x: pd.to_datetime(x).strftime(\"%Y-%m-%d\"))\n", - "df['label'] = df['label'].map(label_map)\n", - "df = df.dropna().reset_index()\n", - "df.head(10)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "66e324e2", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "66e324e2", - "outputId": "5a56ba17-d580-4646-b970-f5fd90367aba" - }, - "outputs": [], - "source": [ - "from sklearn.model_selection import train_test_split\n", - "\n", - "train, val_and_test = train_test_split(df, test_size=0.3)\n", - "val, test = train_test_split(val_and_test, test_size=0.5)\n", - "\n", - "print(train.size, val.size, test.size)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "527edae6", - "metadata": { - "id": "527edae6" - }, - "outputs": [], - "source": [ - "train.to_csv(\"rwanda_cropland_data_train.csv\")\n", - "val.to_csv(\"rwanda_cropland_data_val.csv\")\n", - "test.to_csv(\"rwanda_cropland_data_test.csv\")" - ] - }, - { - "cell_type": "markdown", - "id": "fcc545f4", - "metadata": {}, - "source": [ - "After splitting the data into training, validation, and test sets, the next step is to group the data by the HLS granules they belong to and download the corresponding spectral bands for each granule. Once the bands are retrieved, we will generate smaller chips and target labels with dimensions of 224 x 224 pixels.\n", - "\n", - "By the end of this process, the input data will have a shape of 3 x 6 x 224 x 224 (representing three sets of six spectral bands and 224 x 224 pixel chips), and the target labels will have a shape of 224 x 224.\n", - "\n", - "While these tasks might seem complex, the `InstaGeo-Data` module abstracts this process, allowing you to configure it with a simple command as shown in the following cells" - ] - }, - { - "cell_type": "markdown", - "id": "b901a48e", - "metadata": {}, - "source": [ - "### Training Split" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c11bfeb8-df6b-4b8c-8c8a-059a5a28b8ea", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "c11bfeb8-df6b-4b8c-8c8a-059a5a28b8ea", - "outputId": "4ba7dbb9-0572-4bd8-8972-51418efb5ec3", - "tags": [] - }, - "outputs": [], - "source": [ - "%%bash\n", - "mkdir train\n", - "python -m \"instageo.data.chip_creator\" \\\n", - " --dataframe_path=\"rwanda_cropland_data_train.csv\" \\\n", - " --output_directory=\"train\" \\\n", - " --min_count=4 \\\n", - " --chip_size=224 \\\n", - " --no_data_value=-1 \\\n", - " --temporal_tolerance=3 \\\n", - " --temporal_step=30 \\\n", - " --mask_cloud=False \\\n", - " --num_steps=3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "22FADUFcfYsR", - "metadata": { - "id": "22FADUFcfYsR" - }, - "outputs": [], - "source": [ - "root_dir = Path.cwd()\n", - "chips_orig = os.listdir(os.path.join(root_dir, \"train/chips\"))\n", - "chips = [chip.replace(\"chip\", \"train/chips/chip\") for chip in chips_orig]\n", - "seg_maps = [chip.replace(\"chip\", \"train/seg_maps/seg_map\") for chip in chips_orig]\n", - "\n", - "df = pd.DataFrame({\"Input\": chips, \"Label\": seg_maps})\n", - "df.to_csv(os.path.join(\"train.csv\"))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "UTu-qJkYflGD", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "UTu-qJkYflGD", - "outputId": "36691e05-0a25-4e5f-f3d9-21255cd6c143" - }, - "outputs": [], - "source": [ - "print(f\"The size of the train split: {df.size}\")" - ] - }, - { - "cell_type": "markdown", - "id": "5f87be84", - "metadata": {}, - "source": [ - "### Validation Split" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "qN2Zm9MxfsGc", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "qN2Zm9MxfsGc", - "outputId": "c0c11731-ebef-4fa9-a9be-4be6290a7bfd" - }, - "outputs": [], - "source": [ - "%%bash\n", - "mkdir val\n", - "python -m \"instageo.data.chip_creator\" \\\n", - " --dataframe_path=\"rwanda_cropland_data_val.csv\" \\\n", - " --output_directory=\"val\" \\\n", - " --min_count=4 \\\n", - " --chip_size=224 \\\n", - " --no_data_value=-1 \\\n", - " --temporal_tolerance=3 \\\n", - " --temporal_step=30 \\\n", - " --mask_cloud=False \\\n", - " --num_steps=3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "IAhs3vF9fyO4", - "metadata": { - "id": "IAhs3vF9fyO4" - }, - "outputs": [], - "source": [ - "root_dir = Path.cwd()\n", - "chips_orig = os.listdir(os.path.join(root_dir, \"val/chips\"))\n", - "chips = [chip.replace(\"chip\", \"val/chips/chip\") for chip in chips_orig]\n", - "seg_maps = [chip.replace(\"chip\", \"val/seg_maps/seg_map\") for chip in chips_orig]\n", - "\n", - "df = pd.DataFrame({\"Input\": chips, \"Label\": seg_maps})\n", - "df.to_csv(os.path.join(\"val.csv\"))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c1S26jbTmH4J", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "c1S26jbTmH4J", - "outputId": "1a2f9742-5dea-4b50-c499-0431567cc624" - }, - "outputs": [], - "source": [ - "print(f\"The size of the validation split: {df.size}\")" - ] - }, - { - "cell_type": "markdown", - "id": "3d4e7fa6", - "metadata": {}, - "source": [ - "### Test Split" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "_xaa5V3sf3pJ", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "_xaa5V3sf3pJ", - "outputId": "569456fa-1953-4ebc-cd00-97a361c4cf7f" - }, - "outputs": [], - "source": [ - "%%bash\n", - "mkdir test\n", - "python -m \"instageo.data.chip_creator\" \\\n", - " --dataframe_path=\"rwanda_cropland_data_test.csv\" \\\n", - " --output_directory=\"test\" \\\n", - " --min_count=4 \\\n", - " --chip_size=224 \\\n", - " --no_data_value=-1 \\\n", - " --temporal_tolerance=3 \\\n", - " --temporal_step=30 \\\n", - " --mask_cloud=False \\\n", - " --num_steps=3" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "v7RvCMofgCCa", - "metadata": { - "id": "v7RvCMofgCCa" - }, - "outputs": [], - "source": [ - "root_dir = Path.cwd()\n", - "chips_orig = os.listdir(os.path.join(root_dir, \"test/chips\"))\n", - "chips = [chip.replace(\"chip\", \"test/chips/chip\") for chip in chips_orig]\n", - "seg_maps = [chip.replace(\"chip\", \"test/seg_maps/seg_map\") for chip in chips_orig]\n", - "\n", - "df = pd.DataFrame({\"Input\": chips, \"Label\": seg_maps})\n", - "df.to_csv(os.path.join(\"test.csv\"))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "XYrUAef1mJ0N", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "XYrUAef1mJ0N", - "outputId": "646a14c8-60ea-4780-ca23-8d71ec8635fb" - }, - "outputs": [], - "source": [ - "print(f\"The size of the test split: {df.size}\")" - ] - }, - { - "cell_type": "markdown", - "id": "bcb132b0-04dd-4583-8e30-e6332446a0e6", - "metadata": { - "id": "bcb132b0-04dd-4583-8e30-e6332446a0e6" - }, - "source": [ - "## InstaGeo - Model" - ] - }, - { - "cell_type": "markdown", - "id": "fd313044-829c-482d-934e-9ae662f132fc", - "metadata": { - "id": "fd313044-829c-482d-934e-9ae662f132fc" - }, - "source": [ - "After creating our dataset using the `InstaGeo-Data` module, we can move on to fine-tuning a model that includes a Prithvi backbone paired with a classification head. For regression tasks, the classification head can easily be replaced with a suitable regression head. Additionally, if a completely different model architecture is needed, it can be designed and implemented within this framework." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c64fee29-dca3-4611-a43a-4412a6033751", - "metadata": { - "id": "c64fee29-dca3-4611-a43a-4412a6033751" - }, - "outputs": [], - "source": [ - "import os\n", - "import os\n", - "import pandas as pd\n", - "import numpy as np\n", - "from pathlib import Path" - ] - }, - { - "cell_type": "markdown", - "id": "115cbc92", - "metadata": {}, - "source": [ - "**Launch Training**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "47dc76c1-26b9-47b9-8066-5c1df499b40f", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "collapsed": true, - "id": "47dc76c1-26b9-47b9-8066-5c1df499b40f", - "jupyter": { - "outputs_hidden": true - }, - "outputId": "3669afc6-e528-4211-fc51-27ec8a96eacb", - "tags": [] - }, - "outputs": [], - "source": [ - "!python -m instageo.model.run --config-name=locust \\\n", - " root_dir='.' \\\n", - " train.batch_size=8 \\\n", - " train.num_epochs=3 \\\n", - " train_filepath=\"train.csv\" \\\n", - " valid_filepath=\"val.csv\"" - ] - }, - { - "cell_type": "markdown", - "id": "10db019e", - "metadata": {}, - "source": [ - "**Run Model Evaluation**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f8230588-2842-460c-b2e7-4f2f4175dc3e", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "collapsed": true, - "id": "f8230588-2842-460c-b2e7-4f2f4175dc3e", - "jupyter": { - "outputs_hidden": true - }, - "outputId": "732da70c-6b84-474f-f047-c27292804085", - "tags": [] - }, - "outputs": [], - "source": [ - "!python -m instageo.model.run --config-name=locust \\\n", - " root_dir='.' \\\n", - " test_filepath=\"test.csv\" \\\n", - " train.batch_size=8 \\\n", - " checkpoint_path='checkpoint-path' \\\n", - " mode=eval" - ] - }, - { - "cell_type": "markdown", - "id": "b9a73d79-f69f-4fb8-aa63-768ee3ee47ea", - "metadata": { - "id": "b9a73d79-f69f-4fb8-aa63-768ee3ee47ea" - }, - "source": [ - "**Run Inference**" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "YgxyC5GnvjJU", - "metadata": { - "collapsed": true, - "id": "YgxyC5GnvjJU", - "jupyter": { - "outputs_hidden": true - }, - "tags": [] - }, - "outputs": [], - "source": [ - "# !gsutil cp gs://instageo/utils/africa_prediction_template.csv .\n", - "!mkdir -p inference/2021-06" - ] - }, - { - "cell_type": "markdown", - "id": "9c30afd0-73ec-4eb1-b9c1-a2d36475eae7", - "metadata": { - "id": "9c30afd0-73ec-4eb1-b9c1-a2d36475eae7" - }, - "source": [ - "**Create Inference Data**\n", - "\n", - "For inference, we only need to download the necessary HLS tiles and run inference directly using the sliding window inference feature.\n", - "\n", - "If you're running inference across the entire African continent, you can use the `africa_prediction_template.csv`, which will automatically download 2,120 HLS granules covering Africa and parts of Asia.\n", - "\n", - "For this demo, we'll limit the scope to the HLS granules included in our test split.\n", - "\n", - "Note: Ensure you have approximately 1TB of storage space available for this process if you are running inference across Africa." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8f76b3ea-8ef7-473b-8f2e-72d493fdba03", - "metadata": { - "collapsed": true, - "id": "8f76b3ea-8ef7-473b-8f2e-72d493fdba03", - "jupyter": { - "outputs_hidden": true - }, - "tags": [] - }, - "outputs": [], - "source": [ - "!python -m \"instageo.data.chip_creator\" \\\n", - " --dataframe_path=\"test.csv\" \\\n", - " --output_directory=\"inference/2021-06\" \\\n", - " --min_count=1 \\\n", - " --no_data_value=-1 \\\n", - " --temporal_tolerance=3 \\\n", - " --temporal_step=30 \\\n", - " --num_steps=3 \\\n", - " --download_only" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "ku-49mTIqWvW", - "metadata": { - "id": "ku-49mTIqWvW" - }, - "outputs": [], - "source": [ - "# Instead of downloading new set of HLS tiles, we can use the one for our test split for inference.\n", - "\n", - "!cp -r test/* inference/2021-06" - ] - }, - { - "cell_type": "markdown", - "id": "65e86341-0717-458f-b061-3e957ce8536d", - "metadata": { - "id": "65e86341-0717-458f-b061-3e957ce8536d" - }, - "source": [ - "**Run Inference**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "JJXq8oWNAr1w", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "JJXq8oWNAr1w", - "outputId": "c5611f61-ae55-41a3-d60d-9cf5644c7a4c", - "tags": [] - }, - "outputs": [], - "source": [ - "!python -m instageo.model.run --config-name=locust \\\n", - " root_dir='inference/2021-06' \\\n", - " test_filepath='hls_dataset.json' \\\n", - " train.batch_size=16 \\\n", - " test.mask_cloud=True \\\n", - " checkpoint_path='checkpoint-path' \\\n", - " mode=predict" - ] - }, - { - "cell_type": "markdown", - "id": "8d79e412-8176-411a-8a73-6068e22a0d25", - "metadata": { - "id": "8d79e412-8176-411a-8a73-6068e22a0d25" - }, - "source": [ - "## InstaGeo - Apps\n", - "Once inference has been completed on the HLS tiles and the results have been saved, we can use the `InstaGeo-Apps` module to visualize the predictions on an interactive map.\n", - "\n", - "To visualize the results, simply move the HLS prediction GeoTIFF files to the appropriate directory, and `InstaGeo-Apps` will handle the rest, providing an intuitive and interactive mapping experience." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "8a904882-cde9-40c8-affb-f988892e5183", - "metadata": { - "id": "8a904882-cde9-40c8-affb-f988892e5183", - "tags": [] - }, - "outputs": [], - "source": [ - "!mkdir -p predictions/2023/6\n", - "!mv inference/2023-06/predictions/* /content/predictions/2023/6" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "tQGnk67MY6Cd", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "tQGnk67MY6Cd", - "outputId": "a3ac2e6a-3f41-42aa-ad59-67fab121f1ea", - "tags": [] - }, - "outputs": [], - "source": [ - "!npm install localtunnel" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "BeJsQzkeBNm7", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "collapsed": true, - "id": "BeJsQzkeBNm7", - "jupyter": { - "outputs_hidden": true - }, - "outputId": "5dc4d215-e2b2-42dd-bd33-8e7070cc73ce", - "tags": [] - }, - "outputs": [], - "source": [ - "!nohup streamlit run InstaGeo-E2E-Geospatial-ML/instageo/apps/app.py --server.address=localhost &" - ] - }, - { - "cell_type": "markdown", - "id": "48459844-9277-473b-8cd8-e47691c59c0a", - "metadata": { - "id": "48459844-9277-473b-8cd8-e47691c59c0a" - }, - "source": [ - "Retrieve your IP address which is the password of the localtunnel" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "S5rCS-lVZiWe", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "S5rCS-lVZiWe", - "outputId": "c6fbbe9d-047f-48b0-f0b0-695fc70dd2f8", - "tags": [] - }, - "outputs": [], - "source": [ - "import urllib\n", - "print(\"Password/Endpoint IP for localtunnel is:\",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip(\"\\n\"))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "bB61RRBNY-48", - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "bB61RRBNY-48", - "outputId": "6336b355-b880-4b88-ba9b-0e37ddd6b859", - "tags": [] - }, - "outputs": [], - "source": [ - "!npx localtunnel --port 8501" - ] - }, - { - "cell_type": "markdown", - "id": "9f7d922e", - "metadata": {}, - "source": [ - "## Summary\n", - "\n", - "In this notebook, we demonstrated the end-to-end capabilities of InstaGeo for geospatial machine learning using multispectral data. We began by downloading and processing HLS granules, creating data chips for training, and fine-tuning a model with the Prithvi backbone. Finally, we ran inference on test data and visualized the results using the `InstaGeo-Apps` module.\n", - "\n", - "By leveraging InstaGeo, complex tasks such as data preprocessing, model training, and large-scale inference can be streamlined and efficiently handled with minimal configuration.\n", - "\n", - "If you found this demo helpful, please consider giving our [InstaGeo GitHub repository](https://github.com/instadeepai/InstaGeo-E2E-Geospatial-ML) a star ⭐! Your support helps us continue improving the tool for the community.\n", - "\n", - "Thank you for exploring InstaGeo with us!" - ] - }, - { - "cell_type": "markdown", - "id": "316908ce", - "metadata": {}, - "source": [] - } - ], - "metadata": { - "accelerator": "GPU", + "cells": [ + { + "cell_type": "markdown", + "id": "4574f053-60d4-419b-88cc-c58d08a8177c", + "metadata": { + "id": "4574f053-60d4-419b-88cc-c58d08a8177c" + }, + "source": [ + "# InstaGeo Demo\n", + "\n", + "\"Open\n", + "\n", + "Welcome to the InstaGeo demo notebook! This tutorial showcases the capabilities of InstaGeo, an end-to-end package designed for geospatial machine learning with multispectral data.\n", + "\n", + "In this demonstration, we use ground truth geospatial point observations for cropland classification in Rwanda. The notebook will guide you through the process of creating segmentation-like data from these observations, fine-tuning the [Prithvi](https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M) model, and finally visualizing the inference results on an interactive map.\n", + "\n", + "By the end of this demo, you will gain hands-on experience with key InstaGeo functionalities and learn how it streamlines geospatial ML workflows from data preparation to model inference." + ] + }, + { + "cell_type": "markdown", + "id": "380ead2a", + "metadata": {}, + "source": [ + "# Install InstaGeo" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e550a300", + "metadata": { "colab": { - "gpuType": "T4", - "provenance": [] - }, - "environment": { - "kernel": "python3", - "name": "tf2-cpu.2-11.m115", - "type": "gcloud", - "uri": "gcr.io/deeplearning-platform-release/tf2-cpu.2-11:m115" - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.14" - } - }, - "nbformat": 4, - "nbformat_minor": 5 + "base_uri": "https://localhost:8080/" + }, + "id": "e550a300", + "outputId": "3cbc267b-6c02-4539-fd64-c7b386bc1156", + "tags": [] + }, + "outputs": [], + "source": [ + "repository_url = \"https://github.com/instadeepai/InstaGeo-E2E-Geospatial-ML\"\n", + "\n", + "!git clone {repository_url}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f7e762b", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "9f7e762b", + "jupyter": { + "outputs_hidden": true + }, + "outputId": "4372ac59-3158-48dc-f321-fa2a24ee9255", + "tags": [] + }, + "outputs": [], + "source": [ + "%%bash\n", + "cd InstaGeo-E2E-Geospatial-ML\n", + "pip install -e .[all]" + ] + }, + { + "cell_type": "markdown", + "id": "238c78e7-720e-49ed-b5ce-6be6567d2585", + "metadata": { + "id": "238c78e7-720e-49ed-b5ce-6be6567d2585" + }, + "source": [ + "## EarthData Login\n", + "\n", + "InstaGeo currently supports multispectral data from NASA [Harmonized Landsat and Sentinel-2 (HLS)](https://hls.gsfc.nasa.gov/). Accessing HLS data requires an EarthData user account which can be created [here](https://urs.earthdata.nasa.gov/)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fc7ff9c", + "metadata": { + "id": "4fc7ff9c", + "tags": [] + }, + "outputs": [], + "source": [ + "from getpass import getpass\n", + "import os" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8b7a5f61", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "8b7a5f61", + "outputId": "78a1c750-3a23-4c0a-f45e-94388625fb8b" + }, + "outputs": [], + "source": [ + "# Enter you EarthData user account credentials\n", + "USERNAME = getpass('Enter your EarthData username: ')\n", + "PASSWORD = getpass('Enter your EarthData password: ')\n", + "\n", + "content = f\"\"\"machine urs.earthdata.nasa.gov login {USERNAME} password {PASSWORD}\"\"\"\n", + "\n", + "with open(os.path.expanduser('~/.netrc'), 'w') as file:\n", + " file.write(content)" + ] + }, + { + "cell_type": "markdown", + "id": "488a15d3-f22e-4b2c-85c0-435c832e708c", + "metadata": { + "id": "488a15d3-f22e-4b2c-85c0-435c832e708c" + }, + "source": [ + "## InstaGeo - Data\n", + "\n", + "With InstaGeo installed and EarthData authentication configured, we are now ready to download and process HLS (Harmonized Landsat and Sentinel) granules using the `InstaGeo-Data` module. This module offers several powerful functionalities for handling geospatial data, including:\n", + "\n", + "- Searching and retrieving metadata for HLS granules\n", + "- Downloading specific spectral bands from HLS granules\n", + "- Generating data chips and corresponding target labels for machine learning tasks\n", + "\n", + "These capabilities streamline the preprocessing of multispectral data, setting the foundation for efficient geospatial model development.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "38e36f50", + "metadata": { + "id": "38e36f50" + }, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "from pathlib import Path" + ] + }, + { + "cell_type": "markdown", + "id": "c4d83f72", + "metadata": {}, + "source": [ + "The ground-truth geospatial observations for Rwanda cropland classification used in this notebook were sourced from the [Rwanda 2019 Crop/Non-Crop Labels (HarvestPortal)](https://data.harvestportal.org/dataset/rwanda-2019-crop-non-crop-labels) dataset. Run the following cell to download the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "623f4c83", + "metadata": {}, + "outputs": [], + "source": [ + "!wget -q --show-progress https://data.harvestportal.org/dataset/9f4b6470-2c7b-4559-95cb-49e9fd2923f6/resource/ed0ab379-a688-4419-ab96-181c726e1b22/download/ceo-2019-rwanda-cropland-sample-data-2021-04-20.csv\n", + "!wget -q --show-progress https://data.harvestportal.org/dataset/9f4b6470-2c7b-4559-95cb-49e9fd2923f6/resource/0cfc1320-f909-4759-90f9-cb5c92ca019e/download/ceo-2019-rwanda-cropland-rcmrd-set-1-sample-data-2021-04-20.csv\n", + "!wget -q --show-progress https://data.harvestportal.org/dataset/9f4b6470-2c7b-4559-95cb-49e9fd2923f6/resource/6675cc7e-e6da-4889-9905-60c0d5369ce6/download/ceo-2019-rwanda-cropland-rcmrd-set-2-sample-data-2021-04-20.csv" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "adf67434", + "metadata": { + "id": "adf67434" + }, + "outputs": [], + "source": [ + "df1 = pd.read_csv(\"ceo-2019-rwanda-cropland-sample-data-2021-04-20.csv\")\n", + "df2 = pd.read_csv(\"ceo-2019-rwanda-cropland-rcmrd-set-1-sample-data-2021-04-20.csv\")\n", + "df3 = pd.read_csv(\"ceo-2019-rwanda-cropland-rcmrd-set-2-sample-data-2021-04-20.csv\")\n", + "\n", + "df = pd.concat([df1, df2, df3])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f6541bd6", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + }, + "id": "f6541bd6", + "outputId": "0a55e877-f163-4d7d-c479-542315a0cd37" + }, + "outputs": [], + "source": [ + "df = df[['lat', 'lon', 'collection_time', 'Crop/ or not', 'sample_id']]\n", + "df = df.rename({\"lon\": \"x\", \"lat\":\"y\", \"Crop/ or not\":'label', 'collection_time':\"date\"}, axis=1)\n", + "df.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7ac02c8", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 363 + }, + "id": "e7ac02c8", + "outputId": "40da1092-af57-419f-f75f-781ebdfd3f46" + }, + "outputs": [], + "source": [ + "def label_map(x):\n", + " if x == \"Cropland\":\n", + " return 1\n", + " elif x == \"Non-crop\":\n", + " return 0\n", + " else:\n", + " return np.nan\n", + "\n", + "df['date'] = df['date'].map(lambda x: pd.to_datetime(x).strftime(\"%Y-%m-%d\"))\n", + "df['label'] = df['label'].map(label_map)\n", + "df = df.dropna().reset_index()\n", + "df.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "329a3b51", + "metadata": {}, + "outputs": [], + "source": [ + "print(f\"The number of labeled observations in the aggregated dataset is: {df.shape[0]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "537d9124", + "metadata": {}, + "source": [ + "**Optional**: For the sake of rapid experimentation, let's use a subset of the observations (for instance 10%), while keeping approximately the same distribution for the labels." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31fe42a8", + "metadata": {}, + "outputs": [], + "source": [ + "df = df.groupby('label', as_index=False).sample(frac=0.1).reset_index(drop=True)\n", + "print(f\"The number of labeled observations in the subset is: {df.shape[0]}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "66e324e2", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "66e324e2", + "outputId": "5a56ba17-d580-4646-b970-f5fd90367aba" + }, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "train, val_and_test = train_test_split(df, test_size=0.3)\n", + "val, test = train_test_split(val_and_test, test_size=0.5)\n", + "\n", + "print(train.size, val.size, test.size)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "527edae6", + "metadata": { + "id": "527edae6" + }, + "outputs": [], + "source": [ + "train.to_csv(\"rwanda_cropland_data_train.csv\")\n", + "val.to_csv(\"rwanda_cropland_data_val.csv\")\n", + "test.to_csv(\"rwanda_cropland_data_test.csv\")" + ] + }, + { + "cell_type": "markdown", + "id": "fcc545f4", + "metadata": {}, + "source": [ + "After splitting the data into training, validation, and test sets, the next step is to group the data by the HLS granules they belong to and download the corresponding spectral bands for each granule. Once the bands are retrieved, we will generate smaller chips and target labels with dimensions of 224 x 224 pixels.\n", + "\n", + "By the end of this process, the input data will have a shape of 3 x 6 x 224 x 224 (representing three sets of six spectral bands and 224 x 224 pixel chips), and the target labels will have a shape of 224 x 224.\n", + "\n", + "While these tasks might seem complex, the `InstaGeo-Data` module abstracts this process, allowing you to configure it with a simple command as shown in the following cells" + ] + }, + { + "cell_type": "markdown", + "id": "b901a48e", + "metadata": {}, + "source": [ + "### Training Split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c11bfeb8-df6b-4b8c-8c8a-059a5a28b8ea", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "c11bfeb8-df6b-4b8c-8c8a-059a5a28b8ea", + "outputId": "4ba7dbb9-0572-4bd8-8972-51418efb5ec3", + "tags": [] + }, + "outputs": [], + "source": [ + "%%bash\n", + "mkdir train\n", + "python -m \"instageo.data.chip_creator\" \\\n", + " --dataframe_path=\"rwanda_cropland_data_train.csv\" \\\n", + " --output_directory=\"train\" \\\n", + " --min_count=4 \\\n", + " --chip_size=224 \\\n", + " --no_data_value=-1 \\\n", + " --temporal_tolerance=3 \\\n", + " --temporal_step=30 \\\n", + " --mask_cloud=False \\\n", + " --num_steps=3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "22FADUFcfYsR", + "metadata": { + "id": "22FADUFcfYsR" + }, + "outputs": [], + "source": [ + "root_dir = Path.cwd()\n", + "chips_orig = os.listdir(os.path.join(root_dir, \"train/chips\"))\n", + "chips = [chip.replace(\"chip\", \"train/chips/chip\") for chip in chips_orig]\n", + "seg_maps = [chip.replace(\"chip\", \"train/seg_maps/seg_map\") for chip in chips_orig]\n", + "\n", + "df = pd.DataFrame({\"Input\": chips, \"Label\": seg_maps})\n", + "df.to_csv(os.path.join(\"train.csv\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "UTu-qJkYflGD", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "UTu-qJkYflGD", + "outputId": "36691e05-0a25-4e5f-f3d9-21255cd6c143" + }, + "outputs": [], + "source": [ + "print(f\"The size of the train split: {df.shape[0]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "5f87be84", + "metadata": {}, + "source": [ + "### Validation Split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "qN2Zm9MxfsGc", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "qN2Zm9MxfsGc", + "outputId": "c0c11731-ebef-4fa9-a9be-4be6290a7bfd" + }, + "outputs": [], + "source": [ + "%%bash\n", + "mkdir val\n", + "python -m \"instageo.data.chip_creator\" \\\n", + " --dataframe_path=\"rwanda_cropland_data_val.csv\" \\\n", + " --output_directory=\"val\" \\\n", + " --min_count=4 \\\n", + " --chip_size=224 \\\n", + " --no_data_value=-1 \\\n", + " --temporal_tolerance=3 \\\n", + " --temporal_step=30 \\\n", + " --mask_cloud=False \\\n", + " --num_steps=3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "IAhs3vF9fyO4", + "metadata": { + "id": "IAhs3vF9fyO4" + }, + "outputs": [], + "source": [ + "root_dir = Path.cwd()\n", + "chips_orig = os.listdir(os.path.join(root_dir, \"val/chips\"))\n", + "chips = [chip.replace(\"chip\", \"val/chips/chip\") for chip in chips_orig]\n", + "seg_maps = [chip.replace(\"chip\", \"val/seg_maps/seg_map\") for chip in chips_orig]\n", + "\n", + "df = pd.DataFrame({\"Input\": chips, \"Label\": seg_maps})\n", + "df.to_csv(os.path.join(\"val.csv\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1S26jbTmH4J", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "c1S26jbTmH4J", + "outputId": "1a2f9742-5dea-4b50-c499-0431567cc624" + }, + "outputs": [], + "source": [ + "print(f\"The size of the validation split: {df.shape[0]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "3d4e7fa6", + "metadata": {}, + "source": [ + "### Test Split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "_xaa5V3sf3pJ", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "_xaa5V3sf3pJ", + "outputId": "569456fa-1953-4ebc-cd00-97a361c4cf7f" + }, + "outputs": [], + "source": [ + "%%bash\n", + "mkdir test\n", + "python -m \"instageo.data.chip_creator\" \\\n", + " --dataframe_path=\"rwanda_cropland_data_test.csv\" \\\n", + " --output_directory=\"test\" \\\n", + " --min_count=4 \\\n", + " --chip_size=224 \\\n", + " --no_data_value=-1 \\\n", + " --temporal_tolerance=3 \\\n", + " --temporal_step=30 \\\n", + " --mask_cloud=False \\\n", + " --num_steps=3" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "v7RvCMofgCCa", + "metadata": { + "id": "v7RvCMofgCCa" + }, + "outputs": [], + "source": [ + "root_dir = Path.cwd()\n", + "chips_orig = os.listdir(os.path.join(root_dir, \"test/chips\"))\n", + "chips = [chip.replace(\"chip\", \"test/chips/chip\") for chip in chips_orig]\n", + "seg_maps = [chip.replace(\"chip\", \"test/seg_maps/seg_map\") for chip in chips_orig]\n", + "\n", + "df = pd.DataFrame({\"Input\": chips, \"Label\": seg_maps})\n", + "df.to_csv(os.path.join(\"test.csv\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "XYrUAef1mJ0N", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "XYrUAef1mJ0N", + "outputId": "646a14c8-60ea-4780-ca23-8d71ec8635fb" + }, + "outputs": [], + "source": [ + "print(f\"The size of the test split: {df.shape[0]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "bcb132b0-04dd-4583-8e30-e6332446a0e6", + "metadata": { + "id": "bcb132b0-04dd-4583-8e30-e6332446a0e6" + }, + "source": [ + "## InstaGeo - Model" + ] + }, + { + "cell_type": "markdown", + "id": "fd313044-829c-482d-934e-9ae662f132fc", + "metadata": { + "id": "fd313044-829c-482d-934e-9ae662f132fc" + }, + "source": [ + "After creating our dataset using the `InstaGeo-Data` module, we can move on to fine-tuning a model that includes a Prithvi backbone paired with a classification head. For regression tasks, the classification head can easily be replaced with a suitable regression head. Additionally, if a completely different model architecture is needed, it can be designed and implemented within this framework." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c64fee29-dca3-4611-a43a-4412a6033751", + "metadata": { + "id": "c64fee29-dca3-4611-a43a-4412a6033751" + }, + "outputs": [], + "source": [ + "import os\n", + "import os\n", + "import pandas as pd\n", + "import numpy as np\n", + "from pathlib import Path" + ] + }, + { + "cell_type": "markdown", + "id": "115cbc92", + "metadata": {}, + "source": [ + "**Launch Training**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47dc76c1-26b9-47b9-8066-5c1df499b40f", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "47dc76c1-26b9-47b9-8066-5c1df499b40f", + "jupyter": { + "outputs_hidden": true + }, + "outputId": "3669afc6-e528-4211-fc51-27ec8a96eacb", + "tags": [] + }, + "outputs": [], + "source": [ + "!python -m instageo.model.run --config-name=cropland_class \\\n", + " root_dir='.' \\\n", + " train.batch_size=8 \\\n", + " train.num_epochs=5 \\\n", + " train_filepath=\"train.csv\" \\\n", + " valid_filepath=\"val.csv\"" + ] + }, + { + "cell_type": "markdown", + "id": "10db019e", + "metadata": {}, + "source": [ + "**Run Model Evaluation**" + ] + }, + { + "cell_type": "markdown", + "id": "67acf232", + "metadata": {}, + "source": [ + "Adjust the `checkpoint_path` argument to use the desired model checkpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8230588-2842-460c-b2e7-4f2f4175dc3e", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "f8230588-2842-460c-b2e7-4f2f4175dc3e", + "jupyter": { + "outputs_hidden": true + }, + "outputId": "732da70c-6b84-474f-f047-c27292804085", + "tags": [] + }, + "outputs": [], + "source": [ + "!python -m instageo.model.run --config-name=cropland_class \\\n", + " root_dir='.' \\\n", + " test_filepath=\"test.csv\" \\\n", + " train.batch_size=8 \\\n", + " checkpoint_path='checkpoint-path' \\\n", + " mode=eval" + ] + }, + { + "cell_type": "markdown", + "id": "b9a73d79-f69f-4fb8-aa63-768ee3ee47ea", + "metadata": { + "id": "b9a73d79-f69f-4fb8-aa63-768ee3ee47ea" + }, + "source": [ + "**Run Inference**" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "YgxyC5GnvjJU", + "metadata": { + "collapsed": true, + "id": "YgxyC5GnvjJU", + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# !gsutil cp gs://instageo/utils/africa_prediction_template.csv .\n", + "!mkdir -p inference/2021-06" + ] + }, + { + "cell_type": "markdown", + "id": "9c30afd0-73ec-4eb1-b9c1-a2d36475eae7", + "metadata": { + "id": "9c30afd0-73ec-4eb1-b9c1-a2d36475eae7" + }, + "source": [ + "**Create Inference Data**\n", + "\n", + "For inference, we only need to download the necessary HLS tiles and run inference directly using the sliding window inference feature.\n", + "\n", + "If you're running inference across the entire African continent, you can use the `africa_prediction_template.csv`, which will automatically download 2,120 HLS granules covering Africa and parts of Asia.\n", + "\n", + "For this demo, we'll limit the scope to the HLS granules included in our test split.\n", + "\n", + "Note: Ensure you have approximately 1TB of storage space available for this process if you are running inference across Africa." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f76b3ea-8ef7-473b-8f2e-72d493fdba03", + "metadata": { + "collapsed": true, + "id": "8f76b3ea-8ef7-473b-8f2e-72d493fdba03", + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "# !python -m \"instageo.data.chip_creator\" \\\n", + "# --dataframe_path=\"test.csv\" \\\n", + "# --output_directory=\"inference/2021-06\" \\\n", + "# --min_count=1 \\\n", + "# --no_data_value=-1 \\\n", + "# --temporal_tolerance=3 \\\n", + "# --temporal_step=30 \\\n", + "# --num_steps=3 \\\n", + "# --download_only" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ku-49mTIqWvW", + "metadata": { + "id": "ku-49mTIqWvW" + }, + "outputs": [], + "source": [ + "# Instead of downloading new set of HLS tiles, we can use the one for our test split for inference.\n", + "\n", + "!cp -r test/* inference/2021-06" + ] + }, + { + "cell_type": "markdown", + "id": "65e86341-0717-458f-b061-3e957ce8536d", + "metadata": { + "id": "65e86341-0717-458f-b061-3e957ce8536d" + }, + "source": [ + "**Run Inference**" + ] + }, + { + "cell_type": "markdown", + "id": "80a510ba", + "metadata": {}, + "source": [ + "Adjust the `checkpoint_path` argument to use the desired model checkpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "JJXq8oWNAr1w", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JJXq8oWNAr1w", + "outputId": "c5611f61-ae55-41a3-d60d-9cf5644c7a4c", + "tags": [] + }, + "outputs": [], + "source": [ + "!python -m instageo.model.run --config-name=cropland_class \\\n", + " root_dir='inference/2021-06' \\\n", + " test_filepath='hls_dataset.json' \\\n", + " train.batch_size=16 \\\n", + " test.mask_cloud=True \\\n", + " checkpoint_path='checkpoint-path' \\\n", + " mode=predict" + ] + }, + { + "cell_type": "markdown", + "id": "8d79e412-8176-411a-8a73-6068e22a0d25", + "metadata": { + "id": "8d79e412-8176-411a-8a73-6068e22a0d25" + }, + "source": [ + "## InstaGeo - Apps\n", + "Once inference has been completed on the HLS tiles and the results have been saved, we can use the `InstaGeo-Apps` module to visualize the predictions on an interactive map.\n", + "\n", + "To visualize the results, simply move the HLS prediction GeoTIFF files to the appropriate directory, and `InstaGeo-Apps` will handle the rest, providing an intuitive and interactive mapping experience." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8a904882-cde9-40c8-affb-f988892e5183", + "metadata": { + "id": "8a904882-cde9-40c8-affb-f988892e5183", + "tags": [] + }, + "outputs": [], + "source": [ + "!mkdir -p predictions/2023/6\n", + "!mv inference/2023-06/predictions/* /content/predictions/2023/6" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "tQGnk67MY6Cd", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tQGnk67MY6Cd", + "outputId": "a3ac2e6a-3f41-42aa-ad59-67fab121f1ea", + "tags": [] + }, + "outputs": [], + "source": [ + "!npm install localtunnel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "BeJsQzkeBNm7", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "collapsed": true, + "id": "BeJsQzkeBNm7", + "jupyter": { + "outputs_hidden": true + }, + "outputId": "5dc4d215-e2b2-42dd-bd33-8e7070cc73ce", + "tags": [] + }, + "outputs": [], + "source": [ + "!nohup streamlit run InstaGeo-E2E-Geospatial-ML/instageo/apps/app.py --server.address=localhost &" + ] + }, + { + "cell_type": "markdown", + "id": "48459844-9277-473b-8cd8-e47691c59c0a", + "metadata": { + "id": "48459844-9277-473b-8cd8-e47691c59c0a" + }, + "source": [ + "Retrieve your IP address which is the password of the localtunnel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "S5rCS-lVZiWe", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "S5rCS-lVZiWe", + "outputId": "c6fbbe9d-047f-48b0-f0b0-695fc70dd2f8", + "tags": [] + }, + "outputs": [], + "source": [ + "import urllib\n", + "print(\"Password/Endpoint IP for localtunnel is:\",urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip(\"\\n\"))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bB61RRBNY-48", + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bB61RRBNY-48", + "outputId": "6336b355-b880-4b88-ba9b-0e37ddd6b859", + "tags": [] + }, + "outputs": [], + "source": [ + "!npx localtunnel --port 8501" + ] + }, + { + "cell_type": "markdown", + "id": "9f7d922e", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "In this notebook, we demonstrated the end-to-end capabilities of InstaGeo for geospatial machine learning using multispectral data. We began by downloading and processing HLS granules, creating data chips for training, and fine-tuning a model with the Prithvi backbone. Finally, we ran inference on test data and visualized the results using the `InstaGeo-Apps` module.\n", + "\n", + "By leveraging InstaGeo, complex tasks such as data preprocessing, model training, and large-scale inference can be streamlined and efficiently handled with minimal configuration.\n", + "\n", + "If you found this demo helpful, please consider giving our [InstaGeo GitHub repository](https://github.com/instadeepai/InstaGeo-E2E-Geospatial-ML) a star ⭐! Your support helps us continue improving the tool for the community.\n", + "\n", + "Thank you for exploring InstaGeo with us!" + ] + }, + { + "cell_type": "markdown", + "id": "316908ce", + "metadata": {}, + "source": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "environment": { + "kernel": "python3", + "name": "tf2-cpu.2-11.m115", + "type": "gcloud", + "uri": "gcr.io/deeplearning-platform-release/tf2-cpu.2-11:m115" + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 5 }