Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added automated mask length determination in file #28

Merged
merged 10 commits into from
May 19, 2020

Conversation

vbalderdash
Copy link
Contributor

Added a quick parser to automatically pull the mask length from the header of the LMA '.dat.gz' file for the flash sorting.

@deeplycloudy
Copy link
Owner

@vbalderdash thanks for fixing this long-standing oversight, which is a sneaky way things can go wrong for all of us. Do you think this also fixes #3?

Could you move this into LMADataset, instead? It already parses the data format line, and would solve the problem at the data reader instead of the flash-sort level.
https://github.com/deeplycloudy/lmatools/blob/master/lmatools/io/LMAarrayFile.py#L229

@deeplycloudy
Copy link
Owner

And this might not fix #3, which may have to do with storing enough characters in the station mask in the HDF5 file. But this solves the problem of automated determination, which is the first step.

@vbalderdash
Copy link
Contributor Author

Looks like the root of #3 is line 57 of LMA_h5_write, and it should simple to carry that value through to the right function call. I'll try to move the fix to LMAarrayFile and get that carried through, too.

@deeplycloudy
Copy link
Owner

For carrying values through, feel free to add a mask_length attribute to LMAdataFile.

@vbalderdash
Copy link
Contributor Author

mask_length is now in the LMAdataFile and pulling the value during the unzipped file read. I moved the h5 column class to within the function with the mask length variable so that can be initialized with the variable string size, so it seems to be writing out what is expected! There was a hanging 'import gzip' in the gen_autorun file that needs to be removed

@deeplycloudy
Copy link
Owner

This looks good in terms of the internals, and is what I wish I had written initially!

Have you looked at whether the removal of the mask_length argument to LMADataFile causes upstream scripts (like flash_sort_and_grid) to break? lmatools.flashsort.autosort.autorun_sklearn.cluster definitely passes mask_length to LMADataFile, for instance.

@vbalderdash
Copy link
Contributor Author

The mask_length parameter in lmatools.flashsort.autosort.autorun_sklearn.cluster should be irrelevant since it calls LMADataFile.read which now defines the mask_length, but it probably should be removed for cleanliness. The flash sorting is reasonable as far as I can tell (tested on s6 and s7 formatted files), although I have not verified anything downstream of gridding. I was selfishly focusing on the flashes and and flash extents.

@deeplycloudy
Copy link
Owner

OK, so you've run things end to end and there's no breakage. I was worried about breaking other users' code if they were to update lmatools.

If you want to remove the other mask_length stuff for cleanliness / saving us from future confusion that would be great, or I can merge this now and open an issue to remind us to come back to do the cleanup.

@vbalderdash
Copy link
Contributor Author

Probably still worth testing on another machine first!

@deeplycloudy
Copy link
Owner

deeplycloudy commented Apr 28, 2020

I pushed some changes to your branch that I think fix the issue you reported with an empty station mask value that caused flash sorting to crash.

The changes above also fix #29, I think.

@deeplycloudy deeplycloudy merged commit df1cc5a into deeplycloudy:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants