Define our data storage and bucket schema #23

phargogh · 2022-06-22T19:38:54Z

We'll need the ability to store a variety of files during execution. Define what these filenames should be or what pattern they should take.

Related, which buckets should these files use? Where should the core datasets (e.g. NLUD) be stored?

phargogh · 2022-07-29T19:27:26Z

Here's what we're thinking

So, <backend> refers to some directory-centric storage mechanism, such as the local filesystem of GCS.

<backend>/
        {session_id}/
                scenarios/
                        {scenario_id}_{wallpaper or fill}_{name:sanitized}.tif  # with internal overviews
                        LATER: {scenario_id}_{thumbail}.png
                model_outputs/
                        {job_id}_{model_name}/
                                workspace/
                                        <logfile>
                                        <model specific outputs>
                                <actual output file with overviews, derived from model outputs>

NOTES:
input data will be kept in a private bucket (see: principle of least privilege)


TODO: do we save the whole stack of outputs or just a few of them?
TODO: push docker logs to GCP's logs when running on prod

phargogh · 2022-08-02T23:19:26Z

@dcdenu4 I just realized that scenarios are not attached to sessions, so I propose updating the file structure to be this:

<backend>/
        scenarios/
                {scenario_id}/
                        {scenario_id}_{wallpaper or fill}_{name:sanitized}.tif  # with internal overviews
                        LATER: {scenario_id}_{thumbail}.png
        model_outputs/
                {session_id}/
                        {job_id}_{model_name}/
                                workspace/
                                        <logfile>
                                        <model specific outputs>
                                <actual output file with overviews, derived from model outputs>

NOTES:
input data will be kept in a private bucket (see: principle of least privilege)


TODO: do we save the whole stack of outputs or just a few of them?
TODO: push docker logs to GCP's logs when running on prod

Does that sound OK to you?

See natcap#23

davemfish · 2024-05-30T13:35:03Z

I'm not sure if there's any more work to be done here. If there is, it can be probably be done alongside #25 .

phargogh added the backend Mainly a backend issue / task label Jun 22, 2022

phargogh changed the title ~~Define our data storage schema~~ Define our data storage and bucket schema Jun 22, 2022

phargogh mentioned this issue Jun 22, 2022

First deliverable: A demo for creating scenarios #16

Closed

phargogh added a commit to phargogh/urban-online-workflow that referenced this issue Aug 2, 2022

Putting scenario outputs where we said we would.

15c8a3e

See natcap#23

davemfish assigned dcdenu4 May 30, 2024

davemfish added this to the release milestone May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define our data storage and bucket schema #23

Define our data storage and bucket schema #23

phargogh commented Jun 22, 2022 •

edited

Loading

phargogh commented Jul 29, 2022

phargogh commented Aug 2, 2022

davemfish commented May 30, 2024

Define our data storage and bucket schema #23

Define our data storage and bucket schema #23

Comments

phargogh commented Jun 22, 2022 • edited Loading

phargogh commented Jul 29, 2022

phargogh commented Aug 2, 2022

davemfish commented May 30, 2024

phargogh commented Jun 22, 2022 •

edited

Loading