-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected save behaviour with cached files when underlying file is deleted #1061
Comments
Hi, thanks for submitting this. So what's happened is that when writing out a file, we would use While using the data cache to treat the loaded image like a modified image worked, I wouldn't consider it an intended use (If we indicate so in the docs, please let me know...). The approach I always use is to create a new image and simply copy the header over: import sys
import nibabel
import os
import shutil
if __name__ == "__main__":
fname = sys.argv[1]
shutil.copy(fname,"/tmp")
img = nibabel.load("/tmp/"+os.path.basename(fname),mmap=False)
data = img.get_fdata() # Or `np.asanyarray(img.dataobj)`
os.remove("/tmp/"+os.path.basename(fname))
new_img = img.__class__(data, img.affine, img.header)
nibabel.save(new_img,"temp.nii.gz") I can't think of a good way to restore the old behavior without producing more surprises (unnecessary typecasts or a hidden cache), so my suggestion would be to move to this approach. Happy to talk through alternatives, though. |
Thanks for getting back to me. Your suggestion works for us and so we can can close this issue. I don't think that the docs anywhere suggest that the old behavior is expected, apologies if I suggested it did. It's just that this an old piece of code that has been working across several older versions of nibabel so it was unexpected for us. |
Sorry - can I reopen this one? Thinking about it, I think we used to imply, but perhaps not say, that the cached array becomes definitive after it has reached memory. For example, you can change the cached array and get the changed version back on repeated calls to So, I think someone, probably me, made a mistake by not checking the cache before writing, and I agree that this is a bit surprising. Do you think we can fix that, at this stage? Or - do we need to fix it? |
This was me in c8d93e6, when we finally deprecated
I think we can do it, though it would be an API change (we're overdue for 4.0 anyway). We just need the logic to be clear and documented.
We would also need to check that, in the case of cached fdata, if somebody opens an int16 (with no scale factors) and adds an integer value, we don't end up rescaling because the data array is float.
I've never depended on the behavior, so I'm not sure I'm the right person to ask. |
The reason we might fix it is to maintain the model, which is that the image appears to contain the array returned by I mean that, here we see that modifying the returned array modifies the return value of
But then, it's a bit surprising that these changes have disappeared when we save the image and reload:
Now - what to do? I think the right fix is changing: data = np.asanyarray(self.dataobj) to if img.in_memory:
data = img.get_fdata()
else:
data = np.asanyarray(self.dataobj) See the current code. Does that look right to you. We should definitely add some tests to assert the behavior we think is right. |
LGTM, Matthew. |
I'm afraid I think we have to email out about this, to see what people think - because it's possible people's results will change, without them realizing. We could certainly warn, as in: if img.in_memory:
warning('Using cached data to save image; if you have change the image data in memory'
'your results may differ from versions of Nibabel prior to 4.0')
data = img.get_fdata()
else:
data = np.asanyarray(self.dataobj) |
Agree on consulting the hive mind and warning. Two wrinkles, though:
We can check for these cases and use |
Ouch - I'm glad you remembered these. Am I right that The img.get_fdata(np.float32)
nib.save(img, 'array_from_float32.nii') instead of: img.get_fdata() # Or no get_fdata call.
nib.save(img, 'array_from_float64.nii') That would be surprising, and difficult to explain. |
This one only coerces if there are scale factors.
Unless I might not be reasoning properly about floating point error here, but it seems to me that if you request |
Sorry - yes - I meant that I agree that the person doing And yes, for float32 or float64, the precision loss will be - relatively small - but still - any difference would be surprising, surely. |
I think the use case you're imagining is loading an array at low precision, making some determination based on the data but not changing it, and then resaving the image to a new location. And we'd want it essentially unchanged in that case. I was imagining loading the data, making modifications, and saving, in which case I'd expect the precision I chose to be relevant. We could monkey patch the setitem on the array to set a dirty bit in the image, to distinguish the cases. A bit magical for my taste. At this point, I feel like explicitly creating new images is less ambiguous. Maybe instead we should just warn on saving a proxy image with cached data. Another approach is to warn when the on disk precision is higher than the array precision. |
Another option would be to add out_arr.setflags(write=False) on the way out of |
Could also make that dependent on whether the requested dtype is lower precision than |
Hmm - but that would be rather confusing too - no - that you could get a read-only array sometimes? I mean, that would be rather hard to explain. Honestly, I would be happy to make the output array read-only by default. What do you think? Obviously would need some discussion on the mailing list ... |
If I read in a temp file and cache it (using get_data() or get_fdata() ) and then delete the underlying file, nibabel tries to read from (the memory mapped) file when I try and save the image and thus throws an error.
Example program is:
on running gives:
I believe this might be due to fact when the image is saved, the data is grabbed from the imageproxy object (self.dataobj), without checking the caches (line 1003 in analyze.py of github version).
This is different behavior from (much) older versions of nibabel that were python2.7 compatible. I am not familiar enough with the new codebase to do a pull request atm.
The text was updated successfully, but these errors were encountered: