Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-tensor map class #325

Closed
7 tasks
jadball opened this issue Sep 19, 2024 · 5 comments
Closed
7 tasks

Multi-tensor map class #325

jadball opened this issue Sep 19, 2024 · 5 comments

Comments

@jadball
Copy link
Contributor

jadball commented Sep 19, 2024

We need a new class (or extend PBPMap from

) to handle point-by-point refinement.

The class needs to support the following conditions:

  • Outer dimensions should be in reconstruction space (ri, rj like we use everywhere else)
  • Support mixed-type, ragged-length arrays:
    • At each map pixel, we have variable-length ubis, eps and npks depending on how many grains we found at that point
  • Not necessarily mutable (could make a new array for refinement results)
  • IO support essential - we have to be able to save to disk
  • Some selbest methods - for generating single-valued maps from multi-valued maps. A number of different methods could be used here.
  • Generation from a TensorMap - results of tomographic refinement

From what I can see, AwkwardArray seems like a good way to start:
https://awkward-array.org/doc/main/getting-started/index.html

@jonwright what do you think?

@jonwright
Copy link
Member

For point-by-point results I thought of it as an extension of "columnfile":

  • at each point in the sample, we have zero or more grains.
  • the points in space do not have to be regularly spaced. It could be a coarse grid, with fine steps at grain boundaries, etc.
  • we need columns for the x/y sample position, and one for each of the grain properties (ubi, phase, npks, etc).
  • our columnfile misses the notion of vector or tensor columns. e.g. colfile.g_vector, colfile.ubi

To have something on a regular grid, there seems to be a question of "array of structs" versus "struct of arrays" (https://en.wikipedia.org/wiki/AoS_and_SoA). I am thinking of "SoA" here but I think you have "AoS" above.

Assuming the underlying data (the table of numbers above) stays the same, for SoA, I guess the regular grid is about making row selections to get the grains at a point in space. This could be one awkward array [nx][ny][ngrains=ragged] where each entry gives the row ids in the big table. Then we have to lock the main table unless we update these pointers.

For the "sinogram" style reconstruction, the "columnfile" would give the list of grains found in indexing. The initial "map" would then point to the same row many times for all pixels in the same grain. After refinement, we would copy the initial grain, refine it, and get back a single grain at each point.

Does this make any sense?

(This also looks also like scipy.sparse where we have a series of arrays of (i,j,data) with duplicate entries.)

@jadball
Copy link
Contributor Author

jadball commented Sep 19, 2024

I worry that having both a "SoA" carrying the raw data and an awkward array for row selections makes things very complicated. I think that the "SoA" approach could suffer a lot from slowdown when we are asking to get all the data at a specific grid point. Doing something like grain_map[50, 30] should always be pretty quick, and doing this as a columnfile would mean getting all the rows that match i=50 & j=30 which I fear would be much slower. Limiting the scope of our problem to a many-valued regular grid would simplify things a lot?

@jonwright
Copy link
Member

jonwright commented Sep 20, 2024

The AoS vs SoA question is whether to store the columnfile table data by rows or by columns (e.g. doesn't matter on paper, depends on how computers work, etc). So long as the table rows are sorted by position in space then it can be fast/easy to index into them for both layouts.

With column storage, for the series of different arrays like UBI, UB, eps, sigma, etc, we should be able to share the "position in space indexing" pointers across all columns. Probably this can be done via any one of awkward/ragged/pydata_sparse_gcxs. Or we just have one array for each data item like UBI[ngrains, 3, 3] and for each point in space we have the start, len indices to grab UBI[ start[i,j] : start[i,j] + len[i,j] ] (probably fails for len == 0 ?). So in class PBPmap we would need to:

  • ensure the data are grouped by position in space (no need to sort, but all grains from one position should be together)
  • create an array start[ nx, ny ] with the first row for each position and also len[nx, ny]

Maybe the next thing to do is IO? The input (for now) is the output from PBPmap which is already columnfile. This would just add a couple of look up tables.

@jadball
Copy link
Contributor Author

jadball commented Sep 24, 2024

I think enforcing grouping gets tricky if we want to add a new UBI (or remove one) to/from an existing pixel? Otherwise I like this idea in theory. I think the thing to do is to try and write some refinement code, then probably the requirements we need/the most elegant approach will appear at some point...

@jadball
Copy link
Contributor Author

jadball commented Oct 18, 2024

Solved by #339 - we settled on a columnfile in the end, with i, j, ubi00, ubi01 etc. entries. Seems to work well with minimal extra code needed.

@jadball jadball closed this as completed Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants