assignlabels refresh #81

younes-elhachi · 2019-12-28T23:42:58Z

Hi Jon,

I think that sometimes the labels assignement (refinegrains.py/assignlabels) and other new columns of the ".new" file are not correctly updated for all the peaks; especially when a user do several successive makemap. I noticed that with my data, knowing that I used the version currently installed on rnice.
Here you find a small code to test the assignement (I assumed that the gx, gy, gz are the good values written to the .new file by the routine "refinegrains.py/assignlabels/compute_gv with translation + score_and_assign).
test_assign.txt

What does it give with a data of yours?

Happy Christmas and New Year

Best regards
Younes

jonwright · 2019-12-29T09:47:18Z

Hi Younes,
Thanks for digging into this. The code in refinegrains is showing it age I guess. Do you have an example with input/output files showing the problem? This was buggy in the past but I thought it is mostly fixed provided the --no_sort option is used. Looking at the test script - the gvector values in the columnfile are computed using the translation position of the grain. When assignlabels does peak assignments it recomputes the gvectors using the position of the grain as well.

I would expect a lot of problems with overlapping peaks here. These can switch between grains in a pretty randomised way as they get assigned to the "closest" grain. A better approach would be to handle overlapped peaks properly and flag them. I'll put that in a next comment...

from ImageD11.grain import *
from ImageD11.columnfile import columnfile
import numpy as np

gmap = read_grain_file('grains.map')
c = columnfile('flt.flt.new')
d = c.copy()
d.filter(d.labels>-1)
# or: gvecs = np.array( (c.gx, c.gy, c.gz) ).T
#  ... for gv in gvecs:
for i, peak in enumerate(d.bigarray.T):
    gv = peak[35:38]
    # this is h per grain, but without accounting for x.translation
    h = np.array([np.dot(x.ubi,np.transpose(gv)) for x in gmap])
    hint = np.array([np.floor(x+0.5).astype(np.int) for x in h])
    diff = h-hint
    drlv = sqrt(sum(diff*diff,axis=1))
    ind = drlv.argmin()
    # peak[39] == d.labels[i] ?
    if not ind == int(peak[39]):
        print i, drlv[ind], ind, peak[39]

jonwright · 2019-12-29T10:00:43Z

To deal better with overlaps, and try to overcome a few recurring problems:

Can we convert columnfile.py into a pandas dataframe ?
Raw peak data in one table (x,y,omega,intensity etc)
Detector geometry applied -> XL, YL, ZL : adds a new table depending on geometry
Depending on (UB)+Diffractometer -> OmegaCalc : adds a new table for each grain
Depending on (translation+omega) or (translation+OmegaCalc) -> tth/eta/k/gv/hr/drlv

A peak to grain assignment matrix should be very sparse. Currently only one grain per peak. It would help for twins and duplicates to store the N grains per peak which might be able to index.

The rest of the this would imply a bit of reorganisation to update the geometry to pull out detector versus diffractometer + grain computations.

younes-elhachi · 2019-12-29T19:02:04Z

Hi Jon,
Thank you for the reply.

This was buggy in the past but I thought it is mostly fixed provided the --no_sort option is used.

Yes, I confirm that the problem is fixed for most of the peaks when using no_sort argument. Thank you.

Looking at the test script - the gvector values in the columnfile are computed using the translation position of the grain. When assignlabels does peak assignments it recomputes the gvectors using the position of the grain as well.

np.dot(x.ubi,np.transpose(gv) # this is h per grain, but without accounting for x.translation

So for the "hr, kr, lr" columns, are they computed and stored with accounting for the translation? if not, how to calculate hkl with translation? The only idea I have is to compute_gv by setting t_x, t_y and t_z in the passed parameters to transformation routines but this is already done.

# peak[39] == d.labels[i] ?

Yes.

I would expect a lot of problems with overlapping peaks here. These can switch between grains in a pretty randomised way as they get assigned to the "closest" grain. A better approach would be to handle overlapped peaks properly and flag them

Using --nos_sort, the output of test_assign dropped down from thousands peaks to only 40 peaks more or less. So yes I guess these 40 peaks are the kind of peaks that could be indexed to more than one grain because of overlapping and twinning.

The code in refinegrains is showing it age I guess.

I think the hkl_tolerance alone is not sufficient as a unique cretaria to assign a peak. We can add other conditions.
For now, I noticed that some peaks are false indexed. For example, a peak (2.002, -1.999, 0.998) is indexed to an FCC grain although the (2,2,1) reflection is not allowed for FCC lattice. The drlv is smaller than the hkl_tolerance, but it is not sufficient. The probability of these false indexed peaks increases with the loading (for my case of phase transformation, new martensite peaks are closed/overlapped with the austenite fcc rings). I am doing as below to identify them, post-indexing:
allowedlen = [3,4,8,11,12,16,19,20,24,27,32] # h^2+k^2+l^2 of the 11 innermost fcc reflections
hkl2sum = d.h*d.h+d.k*d.k+d.l*d.l
peakallowed = [int(x) in allowedlen for x in hkl2sum]

Lowering hkl_tolerance reduces the probability of such false indexing but we miss some good peaks as well, that is why I think that this verification should be done when indexing/refining ubi.

To deal better with overlaps, and try to overcome a few recurring problems:
Can we convert columnfile.py into a pandas dataframe ?
Raw peak data in one table (x,y,omega,intensity etc)
Detector geometry applied -> XL, YL, ZL : adds a new table depending on geometry
Depending on (UB)+Diffractometer -> OmegaCalc : adds a new table for each grain
Depending on (translation+omega) or (translation+OmegaCalc) -> tth/eta/k/gv/hr/drlv
A peak to grain assignment matrix should be very sparse. Currently only one grain per peak. It would help for twins and duplicates to store the N grains per peak which might be able to index.
The rest of the this would imply a bit of reorganisation to update the geometry to pull out detector versus diffractometer + grain computations.

I completely agree. This would help to get rid of some recurring problems and contribute in making data analysis easier and maybe more accurate. Other suggestions such as improving the peaksearch algorithm/outputs and refining strain will also help a lot. I am focusing now on writing the thesis, hopefully I will be able to contribute more once I finish the PhD.

jonwright · 2020-01-08T09:40:03Z

So I think this is the same issue as #54 where I will add a note about systematic absences so I will close it for now and transer the "todo" over there. Note that if you can index a peak that should be systematically absent there are two different possibilities:

it is another grain
your grain actually has a lower symmetry
I guess this kind peak should be flagged as assignable via position but problematic due to space group or lattice.

jonwright closed this as completed Jan 8, 2020

This was referenced Jan 8, 2020

indexing - remove/avoid/tag duplicates #54

Closed

Validating grains - completeness etc #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assignlabels refresh #81

assignlabels refresh #81

younes-elhachi commented Dec 28, 2019

jonwright commented Dec 29, 2019

jonwright commented Dec 29, 2019

younes-elhachi commented Dec 29, 2019

jonwright commented Jan 8, 2020

assignlabels refresh #81

assignlabels refresh #81

Comments

younes-elhachi commented Dec 28, 2019

jonwright commented Dec 29, 2019

jonwright commented Dec 29, 2019

younes-elhachi commented Dec 29, 2019

jonwright commented Jan 8, 2020