Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define our data storage and bucket schema #23

Open
phargogh opened this issue Jun 22, 2022 · 3 comments
Open

Define our data storage and bucket schema #23

phargogh opened this issue Jun 22, 2022 · 3 comments
Assignees
Labels
backend Mainly a backend issue / task
Milestone

Comments

@phargogh
Copy link
Member

phargogh commented Jun 22, 2022

We'll need the ability to store a variety of files during execution. Define what these filenames should be or what pattern they should take.

Related, which buckets should these files use? Where should the core datasets (e.g. NLUD) be stored?

@phargogh phargogh added the backend Mainly a backend issue / task label Jun 22, 2022
@phargogh phargogh changed the title Define our data storage schema Define our data storage and bucket schema Jun 22, 2022
@phargogh
Copy link
Member Author

Here's what we're thinking

So, <backend> refers to some directory-centric storage mechanism, such as the local filesystem of GCS.

<backend>/
        {session_id}/
                scenarios/
                        {scenario_id}_{wallpaper or fill}_{name:sanitized}.tif  # with internal overviews
                        LATER: {scenario_id}_{thumbail}.png
                model_outputs/
                        {job_id}_{model_name}/
                                workspace/
                                        <logfile>
                                        <model specific outputs>
                                <actual output file with overviews, derived from model outputs>

NOTES:
input data will be kept in a private bucket (see: principle of least privilege)


TODO: do we save the whole stack of outputs or just a few of them?
TODO: push docker logs to GCP's logs when running on prod

@phargogh
Copy link
Member Author

phargogh commented Aug 2, 2022

@dcdenu4 I just realized that scenarios are not attached to sessions, so I propose updating the file structure to be this:

<backend>/
        scenarios/
                {scenario_id}/
                        {scenario_id}_{wallpaper or fill}_{name:sanitized}.tif  # with internal overviews
                        LATER: {scenario_id}_{thumbail}.png
        model_outputs/
                {session_id}/
                        {job_id}_{model_name}/
                                workspace/
                                        <logfile>
                                        <model specific outputs>
                                <actual output file with overviews, derived from model outputs>

NOTES:
input data will be kept in a private bucket (see: principle of least privilege)


TODO: do we save the whole stack of outputs or just a few of them?
TODO: push docker logs to GCP's logs when running on prod

Does that sound OK to you?

phargogh added a commit to phargogh/urban-online-workflow that referenced this issue Aug 2, 2022
@davemfish
Copy link
Contributor

I'm not sure if there's any more work to be done here. If there is, it can be probably be done alongside #25 .

@davemfish davemfish added this to the release milestone May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Mainly a backend issue / task
Projects
Status: No status
Development

No branches or pull requests

3 participants