Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assignlabels refresh #81

Closed
younes-elhachi opened this issue Dec 28, 2019 · 4 comments
Closed

assignlabels refresh #81

younes-elhachi opened this issue Dec 28, 2019 · 4 comments

Comments

@younes-elhachi
Copy link
Member

Hi Jon,

I think that sometimes the labels assignement (refinegrains.py/assignlabels) and other new columns of the ".new" file are not correctly updated for all the peaks; especially when a user do several successive makemap. I noticed that with my data, knowing that I used the version currently installed on rnice.
Here you find a small code to test the assignement (I assumed that the gx, gy, gz are the good values written to the .new file by the routine "refinegrains.py/assignlabels/compute_gv with translation + score_and_assign).
test_assign.txt

What does it give with a data of yours?

Happy Christmas and New Year

Best regards
Younes

@jonwright
Copy link
Member

Hi Younes,
Thanks for digging into this. The code in refinegrains is showing it age I guess. Do you have an example with input/output files showing the problem? This was buggy in the past but I thought it is mostly fixed provided the --no_sort option is used. Looking at the test script - the gvector values in the columnfile are computed using the translation position of the grain. When assignlabels does peak assignments it recomputes the gvectors using the position of the grain as well.

I would expect a lot of problems with overlapping peaks here. These can switch between grains in a pretty randomised way as they get assigned to the "closest" grain. A better approach would be to handle overlapped peaks properly and flag them. I'll put that in a next comment...

from ImageD11.grain import *
from ImageD11.columnfile import columnfile
import numpy as np

gmap = read_grain_file('grains.map')
c = columnfile('flt.flt.new')
d = c.copy()
d.filter(d.labels>-1)
# or: gvecs = np.array( (c.gx, c.gy, c.gz) ).T
#  ... for gv in gvecs:
for i, peak in enumerate(d.bigarray.T):
    gv = peak[35:38]
    # this is h per grain, but without accounting for x.translation
    h = np.array([np.dot(x.ubi,np.transpose(gv)) for x in gmap])
    hint = np.array([np.floor(x+0.5).astype(np.int) for x in h])
    diff = h-hint
    drlv = sqrt(sum(diff*diff,axis=1))
    ind = drlv.argmin()
    # peak[39] == d.labels[i] ?
    if not ind == int(peak[39]):
        print i, drlv[ind], ind, peak[39]

@jonwright
Copy link
Member

To deal better with overlaps, and try to overcome a few recurring problems:

  • Can we convert columnfile.py into a pandas dataframe ?
  • Raw peak data in one table (x,y,omega,intensity etc)
  • Detector geometry applied -> XL, YL, ZL : adds a new table depending on geometry
  • Depending on (UB)+Diffractometer -> OmegaCalc : adds a new table for each grain
  • Depending on (translation+omega) or (translation+OmegaCalc) -> tth/eta/k/gv/hr/drlv

A peak to grain assignment matrix should be very sparse. Currently only one grain per peak. It would help for twins and duplicates to store the N grains per peak which might be able to index.

The rest of the this would imply a bit of reorganisation to update the geometry to pull out detector versus diffractometer + grain computations.

@younes-elhachi
Copy link
Member Author

Hi Jon,
Thank you for the reply.

This was buggy in the past but I thought it is mostly fixed provided the --no_sort option is used.

Yes, I confirm that the problem is fixed for most of the peaks when using no_sort argument. Thank you.

Looking at the test script - the gvector values in the columnfile are computed using the translation position of the grain. When assignlabels does peak assignments it recomputes the gvectors using the position of the grain as well.

np.dot(x.ubi,np.transpose(gv) # this is h per grain, but without accounting for x.translation

So for the "hr, kr, lr" columns, are they computed and stored with accounting for the translation? if not, how to calculate hkl with translation? The only idea I have is to compute_gv by setting t_x, t_y and t_z in the passed parameters to transformation routines but this is already done.

# peak[39] == d.labels[i] ?

Yes.

I would expect a lot of problems with overlapping peaks here. These can switch between grains in a pretty randomised way as they get assigned to the "closest" grain. A better approach would be to handle overlapped peaks properly and flag them

Using --nos_sort, the output of test_assign dropped down from thousands peaks to only 40 peaks more or less. So yes I guess these 40 peaks are the kind of peaks that could be indexed to more than one grain because of overlapping and twinning.

The code in refinegrains is showing it age I guess.

I think the hkl_tolerance alone is not sufficient as a unique cretaria to assign a peak. We can add other conditions.
For now, I noticed that some peaks are false indexed. For example, a peak (2.002, -1.999, 0.998) is indexed to an FCC grain although the (2,2,1) reflection is not allowed for FCC lattice. The drlv is smaller than the hkl_tolerance, but it is not sufficient. The probability of these false indexed peaks increases with the loading (for my case of phase transformation, new martensite peaks are closed/overlapped with the austenite fcc rings). I am doing as below to identify them, post-indexing:
allowedlen = [3,4,8,11,12,16,19,20,24,27,32] # h^2+k^2+l^2 of the 11 innermost fcc reflections
hkl2sum = d.h*d.h+d.k*d.k+d.l*d.l
peakallowed = [int(x) in allowedlen for x in hkl2sum]

Lowering hkl_tolerance reduces the probability of such false indexing but we miss some good peaks as well, that is why I think that this verification should be done when indexing/refining ubi.

To deal better with overlaps, and try to overcome a few recurring problems:
Can we convert columnfile.py into a pandas dataframe ?
Raw peak data in one table (x,y,omega,intensity etc)
Detector geometry applied -> XL, YL, ZL : adds a new table depending on geometry
Depending on (UB)+Diffractometer -> OmegaCalc : adds a new table for each grain
Depending on (translation+omega) or (translation+OmegaCalc) -> tth/eta/k/gv/hr/drlv
A peak to grain assignment matrix should be very sparse. Currently only one grain per peak. It would help for twins and duplicates to store the N grains per peak which might be able to index.
The rest of the this would imply a bit of reorganisation to update the geometry to pull out detector versus diffractometer + grain computations.

I completely agree. This would help to get rid of some recurring problems and contribute in making data analysis easier and maybe more accurate. Other suggestions such as improving the peaksearch algorithm/outputs and refining strain will also help a lot. I am focusing now on writing the thesis, hopefully I will be able to contribute more once I finish the PhD.

@jonwright
Copy link
Member

So I think this is the same issue as #54 where I will add a note about systematic absences so I will close it for now and transer the "todo" over there. Note that if you can index a peak that should be systematically absent there are two different possibilities:

  • it is another grain
  • your grain actually has a lower symmetry
    I guess this kind peak should be flagged as assignable via position but problematic due to space group or lattice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants