Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add anisotropic coefficient of variation #143

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

Teschl
Copy link
Contributor

@Teschl Teschl commented Oct 29, 2024

I completed the acv() function. I kept part of the original MATLAB code in as comments to better show which parts are based on what code. Right now, the loops are written to support 'C' order arrays instead of the 'F' order we have been using before. This is mainly because the filters/kernels are, and accessing both for computation seemed easier this way.

Is there a reason why we are using the 'F' array order for most functions? Is that convention when working with dem's/hydrology?

I tested this function by using the python toolbox and checking if each filter/kernel is working with the correct values. The following is the result when running it on the 'perfectworld' DEM.

image

image

closes #100

@wkearn
Copy link
Contributor

wkearn commented Oct 30, 2024

I completed the acv() function.

Excellent! I will look at this more closely when I get back from the UK at the end of the week.

Right now, the loops are written to support 'C' order arrays instead of the 'F' order we have been using before. This is mainly because the filters/kernels are, and accessing both for computation seemed easier this way.
Is there a reason why we are using the 'F' array order for most functions? Is that convention when working with dem's/hydrology?

Previously, we privileged column-major/'F' order arrays because that's how MATLAB stores arrays, and it was easier. Since we switched to using the ptrdiff_t dims[2] interface in the various functions, it should not matter exactly what memory layout your array uses, as long as you consistently pass the dims arguments in the right order. The first element should be the size of the dimension that changes fastest as you scan through memory, e.g. rows for column-major/'F' order and cols for row-major/'C' order.

The order of the loops is determined by performance. You should generally always loop over dims[1] then over dims[0] because you want to sequentially access array elements that are next to each other in memory to minimize the number of cache misses you get. Here, though, you use symmetric kernels, and it might not make so much of a difference. We'd have to do some benchmarks.

It should be possible to swap the rows and columns of the filter kernels. For example, your filter_1 could be written in row or column major order:

  float C_filter_1[5][5] = {{1, 0, 1, 0, 1},
                          {0, 0, 0, 0, 0},
                          {1, 0, 0, 0, -1},
                          {0, 0, 0, 0, 0},
                          {-1, 0, -1, 0, -1}};

  float F_filter_1[5][5] = {{1, 0, 1, 0, -1},
                          {0, 0, 0, 0, 0},
                          {1, 0, 0, 0, -1},
                          {0, 0, 0, 0, 0},
                          {1, 0, -1, 0, -1}};

But I can see how it quickly gets confusing to keep track of the indices.

I'll need to think about whether we really need to change the loop order.

I tested this function by using the python toolbox and checking if each filter/kernel is working with the correct values. The following is the result when running it on the 'perfectworld' DEM.

Have you compared these to the MATLAB output? As I said I can add new snapshots to the snapshot testing repository. I got a bit distracted trying to automate the snapshot testing process last week, but I can add those for you.

@wkearn
Copy link
Contributor

wkearn commented Nov 4, 2024

@Teschl: I think this generally looks all right. I added snapshots of the acv output from TopoToolbox v2 to our snapshot data repository. There are some differences between this implementation and the existing one that I think we need to iron out before moving forward. I don't think you need to change anything at the moment, but I do want to think about what we should do about the following:

  1. Your acv handles boundary conditions differently from how GRIDobj.acv handles them. TopoToolbox v2 sets pixels outside the bounds of the array to the nearest value in the array here

    dem     = padarray(dem,[2 2],nan);
    inanpad = isnan(dem);
    [~,L]   = bwdist(~inanpad);
    dem     = dem(L);

    This is what MATLAB calls the replicate boundary condition in imfilter. Your implementation sets these boundary pixels to zero. All the values that are influenced by these border cells are therefore different from the snapshots. While we don't have a bwdist implementation yet (Implement Euclidean distance transform #106), the normal usage doesn't really need one, you just have to do some calculations to figure out which array index represents the nearest pixel to each of those outside of the array.

  2. The above code is also how GRIDobj.acv deals with NaNs, so they are also handled differently by your implementation. This is where we might actually need bwdist or something like it because the nearest valid pixel may be very far from each NaN pixel (cf. the Taiwan DEM). The ACV doesn't make sense for NaN pixels anyway: we only want to use the right boundary values for valid pixels in the DEM, so we might be able to get away with computing nearest neighbor pixels for only those pixels two pixels away from the valid ones or something like that.

  3. There are also some numerical differences (around 0.0001 to 0.001 or so) between the output of your function and the snapshots in the interior of the array, which is not affected by the borders. These are probably to be expected because your filtering implementation won't be exactly identical to MATLAB's.

  4. I'm not particularly happy with the row- vs column-major layout problem. How you have it now works as long as we remember to pass row-major arrays to acv and column-major arrays to everything else. We don't even need to copy the arrays, we just need to swap the dims elements to the opposite of what we would do for column-major arrays, but it is still error prone.

    We could swap the dims elements inside the function, though. We pass in {dims[0],dims[1]} as usual: rows first for column-major, columns first for row-major. Then we immediately create a ptrdiff_t swapped_dims[2] = {dims[1],dims[0]}; in the function. I think things would continue to work at that point, as long as you use swapped_dims instead of dims throughout the function. Then at least the interface is the same everywhere, and we can change the loop order if and when it begins to hamper performance.

  5. I also didn't look too hard into performance. A quick and dirty comparison suggests that this is not much slower than MATLAB, which is good, though MATLAB is doing more with the boundary conditions.

@Teschl
Copy link
Contributor Author

Teschl commented Nov 13, 2024

@wkearn I reworked the function to use col-major order. I haven't really tested this properly jet, so it might be that matching the kernel values to the DEM values is not working 100% correctly at this time. Instead of using two for loops to loop over the matrix, I switched to a single for loop approach. It would be nice if you could give this a once over to see if this makes sense to you.

I also flattened the kernels in col major order but left the old row major order to better visualize how they look. I also left the old code as a comment for now.

What I'm still working on:

  1. correct boundary values
  2. handle NaNs

@Teschl
Copy link
Contributor Author

Teschl commented Nov 20, 2024

How do we want to handle NaNs in this function @wkearn ? As far as I know, it's possible to do in the C function (using isnan()), but we could also have the C function expect that NaNs have been handled prior. Also, what values should be used to replace the NaN values (zero or value of nearest non NaN neighbor)?

Also, the workflow is currently failing because:

E: Failed to fetch mirror+file:/etc/apt/apt-mirrors.txt/pool/main/c/curl/libcurl4-openssl-dev_7.81.0-1ubuntu1.18_amd64.deb 404 Not Found [IP: 52.147.219.192 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
Error: Process completed with exit code 100.

And I don't really understand what I can do to fix this.

@wkearn
Copy link
Contributor

wkearn commented Nov 20, 2024

Also, the workflow is currently failing because:

E: Failed to fetch mirror+file:/etc/apt/apt-mirrors.txt/pool/main/c/curl/libcurl4-openssl-dev_7.81.0-1ubuntu1.18_amd64.deb 404 Not Found [IP: 52.147.219.192 80]
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
Error: Process completed with exit code 100.

And I don't really understand what I can do to fix this.

I think we need to add a apt update before we run apt install. I'll do that now in a separate PR, and you can merge/rebase.

How do we want to handle NaNs in this function @wkearn ? As far as I know, it's possible to do in the C function (using isnan()), but we could also have the C function expect that NaNs have been handled prior. Also, what values should be used to replace the NaN values (zero or value of nearest non NaN neighbor)?

Did you figure out how to handle the boundary conditions similarly to the original implementation? The NaNs are handled similarly to the boundaries in that they are set to the nearest non-NaN neighbor. We might be able to adapt that to handle NaNs in the libtopotoolbox implementation.

As I said before, I don't think we actually need to fill every NaN in the DEM using something like bwdist as it is done in MATLAB: we only need to look at NaNs that are in the neighborhood of the current center pixel, which has to be valid. Then we would find the nearest valid pixel to that NaN. This nearest neighbor has to be closer to the NaN than the center pixel, so you would only have to examine a 3x3 or 5x5 neighborhood around each NaN. Then we fill the NaN with that pixel's value. This seems possible, though it may be fairly complex.

One question is which pixel to replicate if we have something like

1  2   3
4 NaN  5
6  7   8

We should look at MATLAB's bwdist and see which pixel it chooses to be consistent with the previous behavior.

If this proves to be too complicated, we might just assume that the NaNs are filled before the arrays are passed to C and document that somewhere.

@wkearn
Copy link
Contributor

wkearn commented Nov 20, 2024

Merge or rebase #148 when you are ready, and the builds should run on Ubuntu.

@Teschl
Copy link
Contributor Author

Teschl commented Nov 20, 2024

I have been testing the bwdist function in Matlab for a bit and came to the following conclusion: When looking at cell 5, if there are multiple nearest cells, it will choose the one with the lowest index. So 2 will beat 4, 6, 8 for example.

[1, 4, 7]
[2, 5, 8]
[3, 6, 9]

I don't think we actually need to fill every NaN in the DEM

Makes sense. I can also hard-code the search order, since the cell that the kernel is applied to (the center of the kernel) will have to have a value.

@wkearn

@wkearn
Copy link
Contributor

wkearn commented Nov 26, 2024

@Teschl: how is this going? I like what you came up with for handling the NaN/boundary conditions in the last couple of commits. I think there are a few minor issues, which you might be aware of: for example, true_index = true_row * dims[0] + true_col; should probably be true_index = true_col * dims[0] + true_row;.

More importantly, though, have you been testing the output of acv? It has proved to be complicated enough that I would definitely like some tests here to verify that what you've got is correct or close enough to correct.

I added acv snapshots to the snapshot data repository, and yesterday I added some code (#151) to save the data generated by tests in the case of a snapshot test failure, which might make using the snapshot tests here easier.

Below is a patch to tests/snapshot.cpp that you could use to run acv against these snapshots. If you want to use it, you'll need to download the snapshot data archive (https://github.com/TopoToolbox/snapshot_data/releases/download/v1.4.0/snapshot_data.tar.gz) and extract it in the test/snapshots directory. CMake should handle the rest, so cmake --build build and ctest --test-dir build will copy the snapshots into your build directory and run the test. If the acv test fails, the test output will be found in build/test/snapshots/data/$DATASET/test_acv.tif and the original in build/test/snapshots/data/$DATASET/acv.tif.

I also pushed this test to my acv-snapshot-test branch (https://github.com/wkearn/libtopotoolbox/tree/acv-snapshot-test) if you want to look at it, but that has rebased your commits up to 7744123 onto the current main branch, and then added the test patch, so it might not merge very cleanly into your branch at the moment.

diff --git a/test/snapshot.cpp b/test/snapshot.cpp
index b4b7cbf..11c0445 100644
--- a/test/snapshot.cpp
+++ b/test/snapshot.cpp
@@ -40,7 +40,29 @@ void load_data_from_file(const std::string& filename, std::vector<T>& output,
   GDALClose(dataset);
 }
 
+template <typename T, GDALDataType S>
+void write_data_to_file(const std::string& dst_filename,
+                        const std::string& src_filename, std::vector<T>& output,
+                        std::array<ptrdiff_t, 2>& dims) {
+  GDALDataset* src_ds = GDALDataset::Open(src_filename.data(), GA_ReadOnly);
+  assert(src_ds);
+
+  GDALDriver* driver = src_ds->GetDriver();
+  GDALDataset* dst_ds =
+      driver->CreateCopy(dst_filename.data(), src_ds, FALSE, NULL, NULL, NULL);
+  assert(dst_ds);
+
+  GDALRasterBand* poBand = dst_ds->GetRasterBand(1);
+  assert(poBand->RasterIO(GF_Write, 0, 0, dims[0], dims[1], output.data(),
+                          dims[0], dims[1], S, 0, 0) == CE_None);
+
+  GDALClose(dst_ds);
+  GDALClose(src_ds);
+}
+
 struct SnapshotData {
+  std::filesystem::path path;
+
   std::array<ptrdiff_t, 2> dims;
   float cellsize;
 
@@ -48,17 +70,21 @@ struct SnapshotData {
   std::vector<float> filled_dem;
   std::vector<int32_t> flats;
   std::vector<int32_t> sills;
+  std::vector<float> acv;
 
   // Output arrays
   std::vector<float> test_dem;
   std::vector<float> test_filled_dem;
   std::vector<int32_t> test_flats;
   std::vector<int32_t> test_sills;
+  std::vector<float> test_acv;
 
   // Intermediate arrays
   std::vector<uint8_t> bc;
 
   SnapshotData(const std::filesystem::path& snapshot_path) {
+    path = snapshot_path;
+
     GDALAllRegister();
 
     if (exists(snapshot_path / "dem.tif")) {
@@ -87,6 +113,12 @@ struct SnapshotData {
       assert(dims[0] == dims_check[0] && dims[1] == dims_check[1]);
     }
 
+    if (exists(snapshot_path / "acv.tif")) {
+      load_data_from_file<float, GDT_Float32>(snapshot_path / "acv.tif", acv,
+                                              dims_check);
+      assert(dims[0] == dims_check[0] && dims[1] == dims_check[1]);
+    }
+
     // Allocate and resize output and intermediate arrays
     if (dem.size() > 0) {
       if (filled_dem.size() > 0) {
@@ -97,59 +129,127 @@ struct SnapshotData {
       if (flats.size() > 0 && sills.size() > 0) {
         test_flats.resize(dims[0] * dims[1]);
       }
+      if (acv.size() > 0) {
+        test_acv.resize(dims[0] * dims[1]);
+      }
     }
   }
 
-  void runtests() {
-    // fillsinks
-    if (test_filled_dem.size() > 0) {
-      // Initialize bcs
-      for (ptrdiff_t j = 0; j < dims[1]; j++) {
-        for (ptrdiff_t i = 0; i < dims[0]; i++) {
-          if (isnan(dem[j * dims[0] + i])) {
+  int run_fillsinks() {
+    // Initialize bcs
+    for (ptrdiff_t j = 0; j < dims[1]; j++) {
+      for (ptrdiff_t i = 0; i < dims[0]; i++) {
+        if (isnan(dem[j * dims[0] + i])) {
+          bc[j * dims[0] + i] = 1;
+          test_dem[j * dims[0] + i] = -INFINITY;
+        } else {
+          if (i == 0 || i == dims[0] - 1 || j == 0 || j == dims[1] - 1) {
+            // 1 on the boundaries
             bc[j * dims[0] + i] = 1;
-            test_dem[j * dims[0] + i] = -INFINITY;
           } else {
-            if (i == 0 || i == dims[0] - 1 || j == 0 || j == dims[1] - 1) {
-              // 1 on the boundaries
-              bc[j * dims[0] + i] = 1;
-            } else {
-              // 0 on the interior
-              bc[j * dims[0] + i] = 0;
-            }
+            // 0 on the interior
+            bc[j * dims[0] + i] = 0;
           }
         }
       }
+    }
 
-      tt::fillsinks(test_filled_dem.data(), test_dem.data(), bc.data(),
-                    dims.data());
+    tt::fillsinks(test_filled_dem.data(), test_dem.data(), bc.data(),
+                  dims.data());
 
-      for (ptrdiff_t j = 0; j < dims[1]; j++) {
-        for (ptrdiff_t i = 0; i < dims[0]; i++) {
-          if (!isnan(filled_dem[j * dims[0] + i])) {
-            assert(test_filled_dem[j * dims[0] + i] ==
-                   filled_dem[j * dims[0] + i]);
+    for (ptrdiff_t j = 0; j < dims[1]; j++) {
+      for (ptrdiff_t i = 0; i < dims[0]; i++) {
+        if (!isnan(filled_dem[j * dims[0] + i])) {
+          if (test_filled_dem[j * dims[0] + i] != filled_dem[j * dims[0] + i]) {
+            write_data_to_file<float, GDT_Float32>(path / "test_fillsinks.tif",
+                                                   path / "fillsinks.tif",
+                                                   test_filled_dem, dims);
+            return -1;
           }
         }
       }
     }
-    if (flats.size() > 0 && sills.size() > 0) {
-      // identifyflats
-      //
-      // Use the snapshot filled DEM rather than the generated one in
-      // case fillsinks fails.
-      tt::identifyflats(test_flats.data(), filled_dem.data(), dims.data());
-
-      for (ptrdiff_t j = 0; j < dims[1]; j++) {
-        for (ptrdiff_t i = 0; i < dims[0]; i++) {
-          if (flats[j * dims[0] + i] == 1)
-            assert(test_flats[j * dims[0] + i] & 1);
-
-          if (sills[j * dims[0] + i] == 1)
-            assert(test_flats[j * dims[0] + i] & 2);
+    return 0;
+  }
+
+  int run_identifyflats() {
+    // identifyflats
+    //
+    // Use the snapshot filled DEM rather than the generated one in
+    // case fillsinks fails.
+    tt::identifyflats(test_flats.data(), filled_dem.data(), dims.data());
+
+    for (ptrdiff_t j = 0; j < dims[1]; j++) {
+      for (ptrdiff_t i = 0; i < dims[0]; i++) {
+        if (flats[j * dims[0] + i] == 1 && !(test_flats[j * dims[0] + i] & 1)) {
+          write_data_to_file<int32_t, GDT_Int32>(
+              path / "test_identifyflats.tif", path / "identifyflats_flats.tif",
+              test_flats, dims);
+          return -1;
+        }
+
+        if (sills[j * dims[0] + i] == 1 && !(test_flats[j * dims[0] + i] & 2)) {
+          write_data_to_file<int32_t, GDT_Int32>(
+              path / "test_identifyflats.tif", path / "identifyflats_sills.tif",
+              test_flats, dims);
+          return -1;
+        }
+      }
+    }
+    return 0;
+  }
+
+  int run_acv() {
+    // acv
+    tt::acv(test_acv.data(), dem.data(), 0, dims.data());
+
+    for (ptrdiff_t j = 0; j < dims[1]; j++) {
+      for (ptrdiff_t i = 0; i < dims[0]; i++) {
+        if (test_acv[j * dims[0] + i] != acv[j * dims[0] + i]) {
+          write_data_to_file<float, GDT_Float32>(
+              path / "test_acv.tif", path / "acv.tif", test_acv, dims);
+          return -1;
         }
       }
     }
+
+    return 0;
+  }
+
+  int runtests() {
+    int result = 0;
+    // fillsinks
+    if (test_filled_dem.size() > 0) {
+      if (run_fillsinks() < 0) {
+        result = -1;
+
+        std::cout << "[FAILURE] (fillsinks)     " << path << std::endl;
+      } else {
+        std::cout << "[SUCCESS] (fillsinks)     " << path << std::endl;
+      }
+    }
+
+    if (flats.size() > 0 && sills.size() > 0) {
+      if (run_identifyflats() < 0) {
+        result = -1;
+
+        std::cout << "[FAILURE] (identifyflats) " << path << std::endl;
+      } else {
+        std::cout << "[SUCCESS] (identifyflats) " << path << std::endl;
+      }
+    }
+
+    if (acv.size() > 0) {
+      if (run_acv() < 0) {
+        result = -1;
+
+        std::cout << "[FAILURE] (acv) " << path << std::endl;
+      } else {
+        std::cout << "[SUCCESS] (acv) " << path << std::endl;
+      }
+    }
+
+    return result;
   }
 };
 
@@ -171,11 +271,13 @@ int main(int argc, char* argv[]) {
     return 0;
   }
 
+  int result = 0;
   for (const auto& entry :
        std::filesystem::directory_iterator(snapshot_dirpath)) {
-    std::cout << entry.path().filename() << std::endl;
     SnapshotData data(entry.path());
 
-    data.runtests();
+    result |= data.runtests();
   }
+
+  return result;
 }

@Teschl
Copy link
Contributor Author

Teschl commented Nov 26, 2024

Last week I ended with testing the current version and got seg-faults.

for example, true_index = true_row * dims[0] + true_col; should probably be true_index = true_col * dims[0] + true_row;.

This Might already fix that.

I will add the snapshot tests and then we can see what's still missing from this function.

@Teschl
Copy link
Contributor Author

Teschl commented Nov 26, 2024

@wkearn: I added and ran the snapshot tests using the snapshot.cpp file of your branch. Running cmake --build build and ctest --test-dir build completed with all tests passing. I did not find any test_acv.tif files, but the build directory also does not contain any snapshots.

Is this path correct?

build/test/snapshots/data/$DATASET/test_acv.tif

or should it be test/snapshots/data/$DATASET/test_acv.tif ?

When running the function in python, it also completes without any issues, and the dem's contain no visible artifacts that point to something going wrong in a major way.

More importantly, though, have you been testing the output of acv? It has proved to be complicated enough that I would definitely like some tests here to verify that what you've got is correct or close enough to correct.

I have been using python with very small arrays (3x4) to individually test if each cell gets checked in the correct order and the correct neighbors are checked when applying the kernels.

@wkearn
Copy link
Contributor

wkearn commented Nov 26, 2024

@wkearn: I added and ran the snapshot tests using the snapshot.cpp file of your branch. Running cmake --build build and ctest --test-dir build completed with all tests passing. I did not find any test_acv.tif files, but the build directory also does not contain any snapshots.

When you run ctest --test-dir, do you see the snapshot test run? There should be four separate tests (versioninfo, random_dem, excesstopography and snapshot). If not, it is likely that CMake is unable to find GDAL on your computer, which is necessary to build the snapshot tests, you would need to install GDAL independently, and possibly point CMake to it

Is this path correct?

build/test/snapshots/data/$DATASET/test_acv.tif

or should it be test/snapshots/data/$DATASET/test_acv.tif ?

If you run tar -xzf test/snapshots/snapshot_data.tar.gz, you should get a new directory test/snapshots/data with a subdirectory for each dataset and the various snapshots in each of those subdirectories. The CMakeLists.txt sets up a copy-test-files target that should copy the entire test/snapshots directory structure to build/test/snapshots. When you run cmake --build build, do you see Built target copy-test-files?

When running the function in python, it also completes without any issues, and the dem's contain no visible artifacts that point to something going wrong in a major way.

More importantly, though, have you been testing the output of acv? It has proved to be complicated enough that I would definitely like some tests here to verify that what you've got is correct or close enough to correct.

I have been using python with very small arrays (3x4) to individually test if each cell gets checked in the correct order and the correct neighbors are checked when applying the kernels.

👍

@wkearn
Copy link
Contributor

wkearn commented Nov 26, 2024

CI does run the snapshot tests, but it is using the old snapshot data that does not have the acv snapshots. You can bump the version number to v1.4.0 in .github/workflows/ci.yaml:

  debug-build:
    runs-on: ubuntu-latest
    env:
-      snapshot_version: v1.3.0
+      snapshot_version: v1.4.0
    steps:
      - uses: actions/checkout@v4
        with:

This should allow it to run the acv tests.

@Teschl
Copy link
Contributor Author

Teschl commented Nov 26, 2024

It seems it's due to GDAL not beeing installed. I'm having some truble getting GDAL to work so I wasnt able to test it locally jet.

@wkearn
Copy link
Contributor

wkearn commented Nov 26, 2024

It seems it's due to GDAL not beeing installed. I'm having some truble getting GDAL to work so I wasnt able to test it locally jet.

You can always use the snapshot data to compare your results in Python, which might be easier than sorting out GDAL. We just don't have that set up automatically in pytopotoolbox yet.

It looks like the acv snapshot tests fail on CI, but unfortunately there isn't a ton of visibility into the failure from the CI logs. I'll add an option to the CI that lets you download an artifact with all the snapshots in it, including the test output, which could be useful.

One thing to note: if you find that your implementation is getting similar results to the snapshots that are off by a small amount, that is probably floating point errors. We might want to change the test condition from

if (test_acv[j * dims[0] + i] != acv[j * dims[0] + i]) {
  //...
}

to something like

if (fabsf(test_acv[j * dims[0] + i] - acv[j * dims[0] + i]) > 1e-4) {
  // ...
}

to account for possible floating point errors. I would say that any differences larger than 0.0001 or so are probably cause for further investigation, but this threshold is kind of arbitrary, so feel free to play around with it.

@wkearn
Copy link
Contributor

wkearn commented Nov 27, 2024

If it is useful to get the snapshot output from CI, #152 should allow you to download a zip archive of the snapshot test output. Go to the summary page for a given run (e.g. https://github.com/TopoToolbox/libtopotoolbox/actions/runs/12046675226) and scroll down to the artifacts.

@Teschl
Copy link
Contributor Author

Teschl commented Nov 27, 2024

I tested the snapshot results against the C implementation in Python like you suggested and something seems to be going wrong pretty consistently. The difference seems to be between 0.4 and 0.1 in all examples. So that's pretty bad especially considering it was better before.

image

image

image

I'm investigating right now, but I don't think it's the NaN checks, because results are wrong all over.

@Teschl
Copy link
Contributor Author

Teschl commented Nov 27, 2024

The northern slopes seem to perform better than southern ones. But both are far from 0.0001 precision.

src/acv.c Outdated Show resolved Hide resolved
@Teschl
Copy link
Contributor Author

Teschl commented Nov 27, 2024

@wkearn apart from the issue that the precision is not good enough jet, there will probably be issues with the snapshot comparison. The NaN fields in the provided snapshots are different from the NaN areas in the DEM.

This is the DEM:
image

This is the snapshot data:
image

So the NaN's in the snapshot data are not just one uniform field but have a few of different values. And depending on how the output array is provided, the NaNs in the output will be all zeros, for example.
image

@wkearn
Copy link
Contributor

wkearn commented Nov 27, 2024

Right. We should probably change the snapshot test to only test valid (non-NaN) pixels. It is all right if the NaNs in the DEM end up with different output values in the libtopotoolbox or the MATLAB acv implementation.

@wkearn
Copy link
Contributor

wkearn commented Nov 27, 2024

I have a hypothesis about what's happening. Unlike most of the other operations we have been working with, convolution is not agnostic to the row- or column-major layout of the array, because the filter coefficients are oriented in a particular way.

When GDAL loads the data for the snapshot tests, it is stored with the x-axis (East-West) first, and the y-axis (North-South). Again, this doesn't usually matter because everything is usually symmetric: we just pass the x dimension first and the y dimension second and the function works as expected. However, this is opposite from how MATLAB's GRIDobj stores the data, and when we convolve, we end up convolving with the transposed version of the DEM, and this gives us the wrong answers.

I think you got the right answers previously because you had assumed that the array was row-major, and the GDAL arrays are essentially row-major.

I just ran the MATLAB implementation of ACV on Big Tujunga, but I transposed the GRIDobj and the results look basically the same -- there are still some differences that I would believe are numerical errors:

acv_test1
acv_test2

So I think that's the problem. I'm actually not sure about the solution. We could just say "this array is column-major with columns representing the north-south axis and rows representing the east-west axis" and then fix the tests so that we load data in the right orientation. I don't really like that though. I'll have to think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement acv
2 participants