Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: New example transferring data to ORNL DAAC #57

Open
wants to merge 29 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
26e2a20
feat: ATL08 to COPC
wildintellect Sep 16, 2022
f907a92
fix:PDAL COPC options
wildintellect Sep 16, 2022
506ca91
docs:COPC Conversion notes/todo
wildintellect Sep 16, 2022
ef23927
Update copc/pdal_setup.ipynb
wildintellect Oct 11, 2022
0cbc453
Update copc/ATL08_to_COPC.ipynb
wildintellect Oct 11, 2022
b9a2aa4
fix: update maap-py function
wildintellect Oct 12, 2022
f7f7e8d
chore:pull latest
wildintellect Oct 12, 2022
298dd43
Add files via upload
abarciauskas-bgse Mar 2, 2023
63ef6ed
rename file
abarciauskas-bgse Mar 2, 2023
168d7f3
Add files via upload
abarciauskas-bgse Mar 2, 2023
11c34a9
Add files via upload
abarciauskas-bgse Mar 2, 2023
79bc4b2
Delete edl-token-example.ipynb
abarciauskas-bgse Mar 2, 2023
57d3878
Rename edl-token-example (1).ipynb to edl-token-example.ipynb
abarciauskas-bgse Mar 2, 2023
5bf1f7c
Update perf_testing.ipynb
abarciauskas-bgse Mar 3, 2023
ed63859
revert change
abarciauskas-bgse Mar 3, 2023
8e60f86
feat: New example transferring data to ORNL DAAC
wildintellect Mar 17, 2023
82ece45
fix: Address issues from PR #57
wildintellect Mar 17, 2023
c1d516c
fix: Remove confusing link to file that's now in this repo
wildintellect Mar 17, 2023
1c2ce12
feat: Readme for COPC example
wildintellect Mar 17, 2023
5657780
Merge pull request #42 from MAAP-Project/feat/copc_atl
wildintellect Mar 17, 2023
1a8f867
Update notebook with instructions
abarciauskas-bgse Mar 23, 2023
fc0144b
Merge pull request #53 from MAAP-Project/ab/edl-token-example
abarciauskas-bgse Mar 23, 2023
45577ab
part1&2 testing data migration
sdradsb May 2, 2023
0a219c3
new testing data migration script - last cell
sdradsb May 11, 2023
7deb2d1
small change to the last cell
sdradsb May 18, 2023
8a4f880
Merge pull request #59 from sdradsb/main
sdradsb May 18, 2023
24bf30e
feat: New example transferring data to ORNL DAAC
wildintellect Mar 17, 2023
cccb88d
fix: Address issues from PR #57
wildintellect Mar 17, 2023
4c37a34
rebase from main
wildintellect Jun 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
232 changes: 232 additions & 0 deletions daac_publish/daac_upload.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
{
wildintellect marked this conversation as resolved.
Show resolved Hide resolved
wildintellect marked this conversation as resolved.
Show resolved Hide resolved
wildintellect marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"cell_type": "markdown",
"id": "d1862af9",
"metadata": {},
"source": [
"# Upload to ORNL DAAC\n",
"\n",
"This Notebook demonstrates transferring data from MAAP to ORNL DAAC. You need to first identify the correct DAAC to publish your data, and then start the submission process. In this case ORNL DAAC https://daac.ornl.gov/submit/\n",
"\n",
"Currently it pushes data, which incurs egress cost, for this particular dataset that was ~$30. In the future we plan to see about having the DAAC pull data between AWS buckets to avoid egress.\n",
"\n",
"\n",
"## Install Rclone\n",
"\n",
"On the MAAP ADE you need to have [rclone](https://rclone.org/). We chose rclone because it verifies file integrity on upload, can resume uploads, and supports both S3 and FTPS.\n",
"\n",
"```\n",
"# Install rclone\n",
"apt install unzip\n",
"curl https://rclone.org/install.sh | bash\n",
"```\n",
"\n",
"## Setup s3 as source\n",
"```\n",
"rclone config\n",
"\n",
"# Settings to pick (based on the rclone config file)\n",
"[s3]\n",
"type = s3\n",
"provider = AWS\n",
"env_auth = true\n",
"region = us-west-2\n",
"location_constraint = us-west-2\n",
"```\n",
"\n",
"\n",
"## Setup DAAC as destination sftp\n",
"```\n",
"rclone config\n",
"\n",
"# Settings to pick (based on the rclone config file)\n",
"[ornl]\n",
"type = ftp\n",
"host = daacupload.ornl.gov\n",
"# username is all lowercase, even if you signed up differently\n",
"user = <username>\n",
"explicit_tls = true\n",
"no_check_certificate = true\n",
"ask_password = true\n",
"```\n",
"\n",
"You can check your rclone config (and save for later)\n",
"```\n",
"cat /projects/.config/rclone/rclone.conf\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "95ee0ed2",
"metadata": {},
"outputs": [],
"source": [
"# A Simple test to verify permission and upload destination\n",
"#!rclone copyto -P s3:nasa-maap-data-store/file-staging/icesat2-boreal/boreal_agb_202302151676439579_1326.tif ornl:/407161fd93/"
]
},
{
"cell_type": "markdown",
"id": "343b787f",
"metadata": {},
"source": [
"# Setup Transfer List\n",
"\n",
"Initially we thought we could use a STAC query to select the files necessary for transfer. This is the ideal method since, external groups like DAACs can reliably repeat the same query.\n",
"\n",
"In the end however for this particular case the BBOX query was too crude to select the correct full set, so Paul provided a real list in the same format derived in another manner."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "105ef242",
"metadata": {},
"outputs": [],
"source": [
"## You need pystac_client\n",
"#%pip install pystac_client"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "222bea04",
"metadata": {},
"outputs": [],
"source": [
"from pystac_client import Client\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "81fcde17",
"metadata": {},
"outputs": [],
"source": [
"#make a list of granules meeting criteria\n",
"# https://stac.maap-project.org/collections/icesat2-boreal/items?bbox=-180,51.6,180,78\n",
"api = Client.open('https://stac.maap-project.org/')\n",
"\n",
"granule_results = api.search (\n",
" max_items=5000,\n",
" collections=['icesat2-boreal'],\n",
" bbox=[-180,51.6,180,78]\n",
")\n",
"#save list to text file"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f2359f07",
"metadata": {},
"outputs": [],
"source": [
"# create an iterator to get the items\n",
"test = granule_results.get_all_items()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "026fc208",
"metadata": {},
"outputs": [],
"source": [
"# build a list of asset urls\n",
"assets = [item.assets.get('cog_default').href.replace(\"s3://\",\"\") for item in granule_results.get_all_items()]"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "0c482ca5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3556"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# check the number of assets selected\n",
"len(assets)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "e5501e38",
"metadata": {},
"outputs": [],
"source": [
"# convert the asset list to just the basename as save as a text file for rclone to use\n",
"# Filter to only in the list\n",
"#https://rclone.org/filtering/#files-from-read-list-of-source-file-names\n",
"txt_file = 'icesat2_boreal_granules.txt'\n",
"with open(txt_file, 'w') as filehandle:\n",
" filehandle.writelines([f\"{os.path.basename(granule)}\\n\" for granule in assets])"
]
},
{
"cell_type": "markdown",
"id": "9d0725af",
"metadata": {},
"source": [
"# Do the Rclone transfer\n",
"Run this in a terminal (not sure password prompt will work inside a notebook)\n",
"```\n",
"rclone copy --dry-run --no-update-modtime -P --files-from icesat2_boreal_granules.txt s3:nasa-maap-data-store/file-staging/icesat2-boreal ornl:/407161fd93/\n",
"```\n",
"\n",
"An updated list of tiles \n",
"```\n",
"rclone copy --dry-run --no-update-modtime -P --files-from /projects/shared-buckets/nathanmthomas/boreal_agb_tiles_DAAC.txt s3:nasa-maap-data-store/file-staging/icesat2-boreal ornl:/407161fd93/\n",
"```\n",
"\n",
"Example output\n",
"```\n",
"2023-03-17 16:31:52 ERROR : ftp://daacupload.ornl.gov:21/407161fd93: SetModTime is not supported\n",
"Transferred: 27.839 GiB / 27.839 GiB, 100%, 39.908 MiB/s, ETA 0s\n",
"Checks: 3556 / 3556, 100%\n",
"Transferred: 335 / 335, 100%\n",
"Elapsed time: 11m40.2s\n",
"```\n",
"You can ignore the SetModTime error messages."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:root] *",
"language": "python",
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}