ginkgo ILU #1684

yhmtsai · 2024-09-26T09:22:44Z

This pr enables the ginkgo's implementation ILU through the current LU implementation.
To enable that, we need to allow LU can ignore the missing fill-in entries.
It is controlled by checked_lookup parameter, and user needs to pass the same sparsityCsr as the original one as the symbolic factor to get ILU.
Moreover, this pr also adds a parameter for algorithm selection. It will forward the information to LU and unpack them directly to return the same structure as the original sparselib.
Note. LU by default gives combined matrix but ILU gives the composition.

The first commit shows that the hashmap lookup leads the hang issue and bitmap lookup leads illegal memory access (incorrect result with -1 index return).
The second commit shows the results are wrong if we do not ignore missing fill-in entries update after fixing infinite loop of hashmap lookup

After this pr, we provide another way to factorize ILU without relying on the sparse library.

TODO

cholesky (probably another PR)

nbeams

I have some questions and comments, but overall I am excited about this! Thanks @yhmtsai

common/cuda_hip/factorization/lu_kernels.cpp

nbeams · 2024-09-26T18:42:26Z

core/factorization/ilu.cpp

+                    local_system_matrix->get_const_row_ptrs())));
+        ilu =
+            gko::experimental::factorization::Lu<ValueType, IndexType>::build()
+                .with_checked_lookup(true)


Same here, would prefer with_safe_lookup or something like that, personally.

IMO this should rather be a new type, LU and ILU(...) have very different use cases. That would also allow us to build a new, improved ILU preconditioner interface based on the Factorization type instead of the slightly hacky Composition-like type ILU is currently.

nbeams · 2024-09-26T18:50:12Z

core/factorization/ilu.cpp

+                .with_symbolic_factorization(sparsity)
+                .on(exec)
+                ->generate(local_system_matrix)
+                ->get_combined();


Why do get_combined() only to separate the factors out again later? Could we just get the L and U factors out of the Composition and skip the L/U initialization stuff for the syncfree case?

LU always generates the combined matrix first.
We can call unpack or rely on the original code here. I do not check there's any difference between the implementation.

Right, LU creates combined matrix, but my question is, why call get_combined directly (which returns a sparse matrix, right?) when we could store the Factorization and get the L and U factors from that? I guess there's no reason to expect the performance would be different between the Factorization method for splitting the factors from a combined matrix vs the kernels called for turning the cuSPARSE output into factors?

That's right. I did it because it reuse the following part and make the condition block smaller.
I can also use early return to avoid a large condition block actually.
I am not big fan of large condition block

To enable this code path, I also add strategy argument to unpack function.

core/factorization/ilu.cpp

nbeams · 2024-09-26T19:18:08Z

include/ginkgo/core/factorization/ilu.hpp

+ * syncfree is Ginkgo's implementation through the Lu factorization with given
+ * sparsity.
+ */
+enum class factorize_algorithm { sparselib, syncfree };


Since this is specific to incomplete factorizations only (at least for now), would it be better to use ilu_algorithm or something like that? Since it's only in the factorization namespace, nothing specific about ILU there.

Also, nit: cuda and hip aren't capitalized here, but they are in the comment for the factory parameter. I think we should be consistent: either Cuda/Hip (like the Executors) or CUDA/HIP (correct names for the actual software).

I will also use it to IC, so it will be for incomplete factorize algorithm at least.
maybe incomplete_factorize_algorithm if the original one is too general?

That makes sense. I'm mostly thinking about code readability/logicality on the user side. I do think it would be good to specify that this is only for incomplete somehow, though incomplete_factorize_algorithm is pretty long. What if we had the same enum for both ilu and ic but with different names (ilu_algorithm and ic_algorithm)? But we can also go with the longer name. Maybe @upsj has thoughts on this?

nbeams · 2024-09-26T19:29:58Z

include/ginkgo/core/factorization/lu.hpp

@@ -97,6 +97,15 @@ class Lu
         * incorrect results or crash.
         */
        bool GKO_FACTORY_PARAMETER_SCALAR(skip_sorting, false);
+
+        /**
+         * The symbolic factoization should contains the fill-in information. If


nit: suggested minor changes:

"The symbolic factorization contains the fill-in for the matrix. If it does not have full fill-in, as in Ilu, this parameter must be set to true in order to avoid the possibility of hanging or illegal memory accesses during the factorization process. When this is true, the symbolic factorization must still contain the non-zero locations in the original matrix, at minimum."

On a related note, how hard would it be to remove the requirement that the incomplete factorization has to contain all the locations of the original matrix? It would mostly just affect the initialization in Lu, right? The safe lookup could handle the rest?

for the related note, it should work if the algorithm is okay with ignoring the value from A but missing in LU

nbeams · 2024-09-26T20:09:28Z

core/matrix/csr_lookup.hpp

+        auto out_idx = hashmap[hash];
+        // linear probing with invalid_index sentinel to avoid infinite loop
+        while (out_idx >= 0 && local_cols[out_idx] != col) {
            hash++;
            if (hash >= hashmap_size) {
                hash = 0;
            }
-            GKO_ASSERT(hashmap[hash] >= 0);
+            out_idx = hashmap[hash];
            GKO_ASSERT(hashmap[hash] < row_nnz);
        }
-        const auto out_idx = hashmap[hash];
+        // out_idx is either correct or invalid_index, the hashmap sentinel


I am confused here. With this change, lookup_hash_unsafe is the same as lookup_hash (except for one extra assertion for unsafe -- the >=0 -- not sure why?). So...lookup_hash_unsafe isn't really unsafe anymore? Why do we need any changes to unsafe at all, when in the factorization, we switch to use the safe lookup if requested? (lookup[col] rather than lookup.lookup_unsafe(col))

lookup_hash_unsafe is mostly equal to lookup_hash up to one assert as you mentioned.
in my mind. when you call lookup_unsafe, you will assume it always returns a valid value back without any checks.
when calling lookup, you will always checks whether it is valid or not.
I keep them because of the extra assert. unsafe should report the index is invalid in debug mode because it may not be checked in the code.
I want to change the unsafe lookup because the infinite loop leads debug difficult and this change does not affect the performance.
BTW, I do not think there's performance difference between safe and unsafe version, but I will keep this behavior first.
Then, we can see whether to use safe-only lookup.

upsj

Thanks Mike, I'm happy to see how simple this was to extend to ILU. Some design and naming suggestions I'd like to see discussed, and maybe we should build a roadmap for the future ILU preconditioner interface.

upsj · 2024-09-30T12:14:51Z

core/factorization/factorization.cpp

@@ -31,10 +31,25 @@ GKO_REGISTER_OPERATION(initialize_l, factorization::initialize_l);

 template <typename ValueType, typename IndexType>
 std::unique_ptr<Factorization<ValueType, IndexType>>
-Factorization<ValueType, IndexType>::unpack() const
+Factorization<ValueType, IndexType>::unpack(


I think we want to improve the strategy type with 2.0 (or earlier), and the strategies have no impact on triangular solvers, so I think we should avoid adding new strategy-related functionality.

I do not think 2.0 happens soon
if we do not want to introduce here, I can only unpack them by original code path not the unpack them.
@nbeams it should be fine, right? the unpacking codes are the same.

You mean revert back to before 57c0051? It should be fine, as far as I know. I tested it before this commit, anyway.

common/cuda_hip/factorization/lu_kernels.cpp

- bitmap: lead wrong answer or segfault - hashmap: infinite loop

…pposite behavior). Co-authored-by: Tobias Ribizel <[email protected]> Co-authored-by: Natalie Beams <[email protected]>

yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Sep 26, 2024

yhmtsai requested review from nbeams, upsj and a team September 26, 2024 09:22

yhmtsai self-assigned this Sep 26, 2024

ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats type:factorization This is related to the Factorizations mod:all This touches all Ginkgo modules. labels Sep 26, 2024

yhmtsai marked this pull request as ready for review September 26, 2024 09:23

nbeams reviewed Sep 26, 2024

View reviewed changes

upsj requested changes Sep 26, 2024

View reviewed changes

upsj reviewed Sep 30, 2024

View reviewed changes

MarcelKoch reviewed Oct 1, 2024

View reviewed changes

common/cuda_hip/factorization/lu_kernels.cpp Outdated Show resolved Hide resolved

yhmtsai and others added 5 commits October 7, 2024 22:14

add failed test when given symbolic without fillin.

f6f90da

- bitmap: lead wrong answer or segfault - hashmap: infinite loop

fix infinite loop of lookup_hash_unsafe and add test in reference

53f7259

add checked_lookup into LU

7ee9c83

add ilu syncfree through lu implementation

26e1292

update the documentation, change checked_lookup -> has_full_fillin (o…

2e47fa7

…pposite behavior). Co-authored-by: Tobias Ribizel <[email protected]> Co-authored-by: Natalie Beams <[email protected]>

yhmtsai force-pushed the gko_ilu branch from abac7cc to a97eb6a Compare October 7, 2024 20:14

yhmtsai added 2 commits October 7, 2024 22:14

use unpack directly from factorization and add unpack with strategy

57c0051

remove the duplicated initialization

a97eb6a

MarcelKoch added this to the Ginkgo 1.9.0 milestone Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ginkgo ILU #1684

ginkgo ILU #1684

yhmtsai commented Sep 26, 2024 •

edited

Loading

nbeams left a comment

nbeams Sep 26, 2024

upsj Sep 26, 2024

nbeams Sep 26, 2024

yhmtsai Sep 27, 2024

nbeams Sep 27, 2024

yhmtsai Sep 27, 2024

yhmtsai Sep 30, 2024

nbeams Sep 26, 2024

yhmtsai Sep 30, 2024

nbeams Sep 30, 2024

nbeams Sep 26, 2024

yhmtsai Sep 30, 2024

nbeams Sep 26, 2024

yhmtsai Sep 27, 2024

upsj left a comment

upsj Sep 30, 2024

yhmtsai Oct 7, 2024

nbeams Oct 8, 2024

yhmtsai Oct 14, 2024

ginkgo ILU #1684

Are you sure you want to change the base?

ginkgo ILU #1684

Conversation

yhmtsai commented Sep 26, 2024 • edited Loading

nbeams left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

upsj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yhmtsai commented Sep 26, 2024 •

edited

Loading