Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts on default output directory #54

Open
djhoese opened this issue Oct 24, 2019 · 13 comments
Open

Thoughts on default output directory #54

djhoese opened this issue Oct 24, 2019 · 13 comments

Comments

@djhoese
Copy link
Contributor

djhoese commented Oct 24, 2019

Right now the make_GLM_grids.py example script outputs to a 2019/Oct/13/ formatted directory. This directory isn't modifiable by the user of the script, right? I could update it to allow the user to modify where the files go or so that it always rights to the current directory. This example script seems to be used in a couple places so I'm not sure if users are expected to take it and modify it as needed or if it is the advertised "entry point" to using glmtools from the command line.

This output directory only uses the start time of the earliest file. Would it be of interest to glmtools devs to allow for a per-file directory? I could also change the code to do:

start_time.strftime("%Y/%b/%d")

Instead of the hardcoded 20 that it has now for the year.

Just some thoughts while exploring the code. Thanks for your help.

@deeplycloudy
Copy link
Owner

make_GLM_grids.py is the expected entry point for most users, since it handles configuration of the call to grid_GLM_flashes. So I want it to be useful for all ordinary use cases.

The 2019/Oct/13/ format is the hard-coded destination, but -o does allow that directory structure to be placed in any target directory. I see value in something like --no-date-dirs so that the user has total control of the destination path. Feel free to propose a unix-y syntax of your choice!

And, yes, the script should use %Y.

@djhoese
Copy link
Contributor Author

djhoese commented Oct 31, 2019

So I've made a couple changes to my local version of this script for the CSPP packaging that I'm hoping I can convince you would be good changes are for the examples script here. 😄

How willing are you to break backwards compatibility as long as there is a method for reproducing the same output? If the -o flag is used and passed to start_time.strftime then a user can provide the full dated path to the command line (ex. -o /data/my/glm/%Y/%b/%d). This gives full control to the user without forcing a particular directory structure on them and doesn't require a new command line flag to control another one. Another example would be if a user wanted number-only directories and wanted to separate by hour: -o /data/%Y/%m/%d/%H.

The other changes I've made have been replacing all _ with - in the command line flags to match typical unix style command line tools. I also removed the duplicate dest keyword arguments where argparse would have come up with the same thing based on the command line flag.

@deeplycloudy
Copy link
Owner

@djhoese let's go for the route you proposed above. If there's a way to retain backwards compatibility with a default, for instance to -o, let's do that, but I'm not hung up on it.

@deeplycloudy
Copy link
Owner

@djhoese are you still thinking of pushing any of the changes above? In #59 you mentioned you were going to take a look at it in the fleeting moments before the holidays, so I wanted to follow up before making any more decisions over there.

@djhoese
Copy link
Contributor Author

djhoese commented Jan 9, 2020

No, sorry. I completely forgot about this and a lot of other things have come up (including glmtools and CSPP Geo stuff) so I don't think I can make any promises of looking at this this week.

@djhoese
Copy link
Contributor Author

djhoese commented Jan 17, 2020

@deeplycloudy In response to your discussion in #59 about the changes I have planned regarding this issue: the work I'm doing with CSPP Geo has now evolved to the point that I've made my own copy of the make_GLM_grids.py script that I copied from here. I did this because it seemed like backwards compatibility was going to be difficult. With our off-github discussion, this may require even more changes that I was originally thinking.

Besides removing/hiding some flags based on our discussion, I've also changed _ in all the flags so they are now - to match other linux command line tools. When I've completed my version of the script maybe I can point you to it and we'll see what you think. I'm not against trying to merge things back together, but I'm a little worried that the amount of functionality you want to provide to all users is too much for the simple command line tool I was hoping to create. Again, open to discussion, maybe I should have made a new issue...

@deeplycloudy
Copy link
Owner

Ah, ok, I assumed too much about how you've been working with make_GLM_grids.py in CSSP Geo. I'm open to taking a look at your simpler script, and might be able to spot some other essential ideas that could improve the 'standard' example script. Or maybe we just adopt yours. Let's have that conversation on a separate CSSP needs thread? Or over email is still fine if you prefer that.

For this issue in particular, I think there's at least a common need to plumb through (to the output writer) the ability to provide an arbitrary filename template. If you've figured out how to do that part it would be a valuable PR.

@djhoese
Copy link
Contributor Author

djhoese commented Jan 30, 2020

So really what we're talking about is this piece of code in lmatools, right?

        basename_parts = (output_filename_prefix, 
                          self.start_time.strftime('%Y%m%d_%H%M%S'), 
                          to_seconds(self.duration), 
                          self.min_points_per_flash, 
                          self.dx_units,
                          )
        outfile_template = '%s_%s_%d_%dsrc_%s-dx_%s'
        outfile_basenames = list(outfile_template % (basename_parts + (pfx,)) 
                             for pfx in self.outfile_postfixes)
        outfiles = list(os.path.join(outpath, outfile_basename)
                    for outfile_basename in outfile_basenames)

I think the easiest way of handling this, with the most flexibility, would be have the user pass in a python format string that gets filled in with a specific set of keys. For example:

# from user
output_pattern = "{start_time:%y/%b/%d}/{prefix}_{start_time:%Y%m%d_%H%M%S}_{duration}_{min_points_per_flash}src_{dx}-dx_{suffix}.nc"

Then lmatools does:

output_filename = output_pattern.format(start_time=self.start_time, prefix=output_filename_prefix, ...)

Or at least something like this. Now this does look ugly for the filename pattern that is in lmatools right now, but it can have a default and most users probably won't want to customize this. Does GLMtools use a different default filename scheme?

@deeplycloudy
Copy link
Owner

Overall I like your templating scheme, but what's going on is somewhat more complicated. There's an old lmatools writer interface that is overridden in glmtools.grid.make_grids.GLMlutGridder.write_grids, which calls glmtools.io.imagery.write_goes_imagery, where you'll see

        outfile = os.path.join(outpath, dataset.attrs['dataset_name'])

and dataset_name comes from get_glm_global_attrs,

    dataset_name = "OR_GLM-L2-GLM{5}-{0}_{1}_s{2}_e{3}_c{4}.nc".format(
        modes[timeline], platform, start.strftime('%Y%j%H%M%S0'),
        end.strftime('%Y%j%H%M%S0'), created.strftime('%Y%j%H%M%S0'),
        scenes[scene_id]

However, taking your proposed path, perhaps we could revert glmtools to use the lmatools naming infrastructure, and then implement glm_global_attrs dataset_name piece as part of the driver script instead of as part of the weeds of the glmtools imagery construction process?

I'd prefer to keep this naming convention as the glmtools default since it's what the Unidata feed expects and matches the standard ABI naming and metadata conventions. I don't care too much how we get there.

@deeplycloudy
Copy link
Owner

Looking at it more this morning, I think it would be as simple as allowing and using a template in output in the call to glmtools.io.imagery.write_goes_imagery. No changes to lmatools necessary. I'm going to take a crack at this and address #26 at the same time.

@deeplycloudy
Copy link
Owner

@djhoese does the proposed fix in #63 meet your needs?

@djhoese
Copy link
Contributor Author

djhoese commented Feb 16, 2020

I think that should work. After looking at the changes I'm glad you tackled it instead of me. Thanks. One other thing I could see changing that I know you don't want to is changing the default directory to be the current directory. It looks like you did that for the python code, but I understand wanting to keep it the current way for the script so I don't want to push, just mentioning it. It's a backwards incompatible change so now would be the chance.

Thanks for handling this.

@deeplycloudy
Copy link
Owner

You're right, it's better to default to the current directory. If anyone wants to prepend a path they have to copy the whole template string anyway, so in practice this doesn't save anyone much effort while generating surprise directories. The current behavior has been turned into an example, and default is now the current directory in #63.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants