-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify how grids should be flattened #118
Comments
@aaraney This is an excellent question and one that is not addressed adequately (at all?) in the docs. BMI doesn't really take a position on how one might flatten a multidimensional array, only that it is done in a consistent way. For example, if your BMI returns two quantities through a call to get_value("val1", array1);
get_value("val2", array2); the value at If your values are on a grid, the ordering of those values must also be consistent with the ordering of your grid elements. In the case of an unstructured grid, for example, the x-coordinates of the elements are given by get_grid_x(grid_id, x);
get_value("val1", array1); In this case, the x-coordinate for the n-th element of There is some ambiguity in the case of a uniform rectilinear grid (i.e. raster grid) since the BMI defines the topology, not through To get around this, one could potentially treat their raster grid as an unstructured mesh and then provide the grid elements column-by-column. The BMI has prioritized hiding a model's implementation details (e.g. how values are stored in memory—row-major, column-major, non-unit stride, etc.) over performance (i.e. extra copies). These are certainly topics for discussion, though. Did that help to answer you question? If not, or you have more questions, I would be happy to discuss them here or we could arrange a meeting sometime. |
We could add language to the docs to the effect of
@aaraney If you have suggestions on how to amend (and improve) the docs, let us know! |
The problem with this is that without an agreed upon definition, you cannot guarantee the interoperability of any any data between any two BMI models/components. |
This is basically what we are unsure of, or were expecting to be part of the interface specification... is that assumption documented anywhere, say for PyMT or anywhere else in CSDMS?
Bingo... we don't care how the model is managing its data internally, but to pass data between two models we have to either know how the data will come out of |
First off, thanks for the quick reply, @mcflugen and @mdpiper! Just so we are on the same page, you answered my question. However, in line with the comments my colleagues (@hellkite500 and @mattw-nws) made above it would be extremely helpful if the BMI took a stance on this issue or provided a convention through something like To hopefully move the conversation forward, I see several ways work around this. Im sure i'll miss a few other obvious ones along the way, so please add other viable options you think of as well.
this option seems to break with some of the core goals of BMI and is likely not viable.
|
BMI doesn't specify how to flatten arrays if the model doesn't implement the grid functions. It is the
This is just barely touched upon in the get_grid section of the docs but, now that I look it over, it is not at all clear how this ordering would affect the ordering of data passed through the For models that don't implement the
I think we've actually done this in the Do you think that a better description of the
If we were to do something like this, I think it would be a part of the To summarize:
Sorry for the confusion on this. Are we getting closer to a solution? |
I think you're getting closer to a solution indeed.
I always struggle with the word row major or column major. This suggests to
me that the data would be ordered differently in C and Fortran, but that's
only the case if you require that the size should be visually the same in
the two languages.
I prefer to look at it from the other side. If you transfer data from
Fortran to C, the values remain stored in the same order ... that's the
most efficient for data exchange. To be even more pragmatic, let's just
assume that we pass the c_loc of (the start of) a Fortran array, and hence
nothing can change the data order.This will work fine if the *shape*
information of the array is adjusted: the size [Imax,Jmax,Kmax] becomes
[Kmax,Jmax,Imax] when crossing the language bridge.
Whenever *shape* information is exchanged through the BMI we should make
clear whether the fastest increasing dimension is: the first (like in
Fortran/MATLAB) or the last (like in C/Python). If I understand correctly,
the row-by-row ordering mentioned in the documentation defines the shape as
one would see it in C/Python. A Fortran/MATLAB component doesn't have to
flip the data to be BMI compliant, but it should flip the size array when
it exposes the shape of the grid via the BMI API.
Currently, the order of dimensions is only implied in the get_grid_shape,
get_grid_spacing and get_grid_origin calls.
However, to support the exchange of multi-dimensional non-geospatial i.e.
non-gridded variables, we've included at Deltares a get_var_shape call that
isn't part of the core BMI standard.
This call obviously suffers from the same ambiguity. The implementation
uses the same C convention as the documentation of the aforementioned BMI
grid calls describes.
I hope that this view helps to converge to a solution.
…On Tue, Feb 21, 2023 at 7:01 PM Eric Hutton ***@***.***> wrote:
@hellkite500 <https://github.com/hellkite500>
BMI doesn't specify how to flatten arrays. This is left to the
implementation.
The problem with this is that without an agreed upon definition, you
cannot guarantee the interoperability of any any data between any two BMI
models/components.
BMI doesn't specify how to flatten arrays if the model doesn't implement
the grid functions. It is the get_grid functions that define the ordering
of values and so how one would ravel/unravel data.
@mattw-nws <https://github.com/mattw-nws>
This is basically what we are unsure of, or were expecting to be part of
the interface specification... is that assumption documented anywhere, say
for PyMT or anywhere else in CSDMS?
This is just barely touched upon in the get_grid section of the docs
<https://bmi.readthedocs.io/en/stable/#get-grid-shape> but, now that I
look it over, it is not at all clear how this ordering would affect the
ordering of data passed through the get_value and set_value functions.
When we talked about *ij* ordering, we had it in our minds that the
ordering of the grid would be row-by-row but we didn't actually say that.
The data ordering for both the set_value and get_value are defined
through the get_grid functions.
For models that don't implement the get_grid functions, even if you knew
the ordering, without *a priori* knowledge of the models you couldn't
confidently exchange data between them. Even if the two models ordered
their values is the same way, you wouldn't know that those values were
located at the same points.
@aaraney <https://github.com/aaraney>
BMI takes a stance on how matrices / tensors are to be flattened
I think we've actually done this in the get_grid methods for structured
grids. Although these functions don't say how a model should store its
values internally they do specify the ordering for exchange through
get_value and set_value.
Do you think that a better description of the get_grid methods and how
they relate to the get_value and set_value functions would help clear up
this confusion? That is, using the get_grid_shape and get_grid_spacing
functions one can determine—*by assuming row-by-row ordering*—the
locations of a model's elements and so how they would be flattened.
BMI introduces a new getter function that returns a models flattening
convention.
If we were to do something like this, I think it would be a part of the
get_grid functions. Using *numpy* as a guide, it could be *get_grid_order*,
would only be implemented for structured grids, and the order could be
something like "C" or "F".
To summarize:
- The ordering of data are provided through the get_grid functions
- For structured grids, it's not clear what the ordering of elements
are. Two potential solutions are:
- Introduce a new function (e.g. get_grid_order) that specifies the
ordering.
- Update the docs to specify the element ordering, which is
currently row major.
Sorry for the confusion on this. Are we getting closer to a solution?
—
Reply to this email directly, view it on GitHub
<#118 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKKVWQDVTU3UM37VVBYMCLWYTYF5ANCNFSM6AAAAAAU7ZKN2M>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Just so we are all on the same page, I am both concerned with the visual ordering (think nested loop over a matrix) as well as the in-memory ordering.
I think this is where I am a little lost. In my review of the
Ahhh, okay so the implicit assumption is that matrices are in row-major order.
Yes! And it sounds like, from what you stated, that the BMI expects that arrays are flattened and exchanged using row-major order (that's what im taking row-by-row to mean). If that is the case, I think adding that to the docs would be tremendously helpful.
👍🏻 that's in line with what I was thinking!
Im still not sold on this, but I asked for clarification above. Just marking it, more or less. |
Thanks for weighing in, @hrajagers! To your point, row-major and column-major are confusing terms. Im not sure that you meant this in you comment, but just so we are on the same page (and for the sake of other future readers), the in-memory representation of array's with rank greater than 1 in C and Fortran differ. Fortran uses column-major order for contiguous representation of a matrix and C uses row-major order.
Since the BMI only exchanges 1D arrays (
Right, but that also means you have to know the origin language or convention of the model to operate on its data. That kind of defeats a major part of having the BMI in the first place, right?
Just so we are on the same page, I am assuming you mean, |
I didn't want to cause confusion @aaraney If we want to keep the order of the values in the memory the same as in model component then the question how to flatten the array transforms into the question: how to properly describe the size of the array. The FORTRAN There seems, however, to be a fundamental restriction in BMI in the sense that it assumes that for the uniform rectilinear grid (described using |
Thanks for the dialog, @hrajagers! I think your response better frames the issue I was trying to describe originally. Namely, the BMI does not provide a way to describe the stride of an array. In your response, for clarity, you used the word size, but I think stride is slightly more accurate. However, I think we are talking about the same thing. Please correct me if i'm wrong :). Earlier this week, in an offline discussion with my colleague, @mattw-nws, proposed describing the strides of an array with a new @mattw-nws wrote:
A
Admittedly, initially I was a little skeptical of For concreteness, revisiting my original example and assuming [
[1, 2], # assume 4 byte int
[3, 4],
]
# strides in C order / row-major / slowest to fastest
[8, 4]
# more generally
[len(dim x) * size_t, size_t]
# strides in Fortran order / column-major / fastest to slowest
[4, 8]
# more generally
[size_t, len(dim y) * size_t]
# indexing using strides
index = y_index * y_stride + x_index * x_stride While thinking about |
In reading through the BMI documentation and best practices, I came across several places that helpfully note that arrays are always flattened in BMI, even if the model uses dimensional variables (one instance). The best practice documentation says the following about the matter:
However, I was not able to find documentation that says how arrays are flattened, meaning are they flattened using row-major (C) or column-major (Fortran) ordering? Without knowing how arrays are flattened, it is ambiguous as to how they should be unflattened. In practice, if coupled models use different flattening conventions their output will almost certainly be spurious.
For example how should the following 2x2 array be flattened and then unflattened?
The text was updated successfully, but these errors were encountered: