Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ginkgo ILU #1684

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open

ginkgo ILU #1684

wants to merge 7 commits into from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Sep 26, 2024

This pr enables the ginkgo's implementation ILU through the current LU implementation.
To enable that, we need to allow LU can ignore the missing fill-in entries.
It is controlled by checked_lookup parameter, and user needs to pass the same sparsityCsr as the original one as the symbolic factor to get ILU.
Moreover, this pr also adds a parameter for algorithm selection. It will forward the information to LU and unpack them directly to return the same structure as the original sparselib.
Note. LU by default gives combined matrix but ILU gives the composition.

The first commit shows that the hashmap lookup leads the hang issue and bitmap lookup leads illegal memory access (incorrect result with -1 index return).
The second commit shows the results are wrong if we do not ignore missing fill-in entries update after fixing infinite loop of hashmap lookup

After this pr, we provide another way to factorize ILU without relying on the sparse library.

TODO

  • cholesky (probably another PR)

@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Sep 26, 2024
@yhmtsai yhmtsai requested review from nbeams, upsj and a team September 26, 2024 09:22
@yhmtsai yhmtsai self-assigned this Sep 26, 2024
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats type:factorization This is related to the Factorizations mod:all This touches all Ginkgo modules. labels Sep 26, 2024
@yhmtsai yhmtsai marked this pull request as ready for review September 26, 2024 09:23
Copy link
Collaborator

@nbeams nbeams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions and comments, but overall I am excited about this! Thanks @yhmtsai

common/cuda_hip/factorization/lu_kernels.cpp Outdated Show resolved Hide resolved
local_system_matrix->get_const_row_ptrs())));
ilu =
gko::experimental::factorization::Lu<ValueType, IndexType>::build()
.with_checked_lookup(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, would prefer with_safe_lookup or something like that, personally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should rather be a new type, LU and ILU(...) have very different use cases. That would also allow us to build a new, improved ILU preconditioner interface based on the Factorization type instead of the slightly hacky Composition-like type ILU is currently.

.with_symbolic_factorization(sparsity)
.on(exec)
->generate(local_system_matrix)
->get_combined();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do get_combined() only to separate the factors out again later? Could we just get the L and U factors out of the Composition and skip the L/U initialization stuff for the syncfree case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LU always generates the combined matrix first.
We can call unpack or rely on the original code here. I do not check there's any difference between the implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, LU creates combined matrix, but my question is, why call get_combined directly (which returns a sparse matrix, right?) when we could store the Factorization and get the L and U factors from that? I guess there's no reason to expect the performance would be different between the Factorization method for splitting the factors from a combined matrix vs the kernels called for turning the cuSPARSE output into factors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I did it because it reuse the following part and make the condition block smaller.
I can also use early return to avoid a large condition block actually.
I am not big fan of large condition block

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable this code path, I also add strategy argument to unpack function.

core/factorization/ilu.cpp Outdated Show resolved Hide resolved
* syncfree is Ginkgo's implementation through the Lu factorization with given
* sparsity.
*/
enum class factorize_algorithm { sparselib, syncfree };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is specific to incomplete factorizations only (at least for now), would it be better to use ilu_algorithm or something like that? Since it's only in the factorization namespace, nothing specific about ILU there.

Also, nit: cuda and hip aren't capitalized here, but they are in the comment for the factory parameter. I think we should be consistent: either Cuda/Hip (like the Executors) or CUDA/HIP (correct names for the actual software).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also use it to IC, so it will be for incomplete factorize algorithm at least.
maybe incomplete_factorize_algorithm if the original one is too general?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'm mostly thinking about code readability/logicality on the user side. I do think it would be good to specify that this is only for incomplete somehow, though incomplete_factorize_algorithm is pretty long. What if we had the same enum for both ilu and ic but with different names (ilu_algorithm and ic_algorithm)? But we can also go with the longer name. Maybe @upsj has thoughts on this?

@@ -97,6 +97,15 @@ class Lu
* incorrect results or crash.
*/
bool GKO_FACTORY_PARAMETER_SCALAR(skip_sorting, false);

/**
* The symbolic factoization should contains the fill-in information. If
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suggested minor changes:

"The symbolic factorization contains the fill-in for the matrix. If it does not have full fill-in, as in Ilu, this parameter must be set to true in order to avoid the possibility of hanging or illegal memory accesses during the factorization process. When this is true, the symbolic factorization must still contain the non-zero locations in the original matrix, at minimum."

On a related note, how hard would it be to remove the requirement that the incomplete factorization has to contain all the locations of the original matrix? It would mostly just affect the initialization in Lu, right? The safe lookup could handle the rest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the related note, it should work if the algorithm is okay with ignoring the value from A but missing in LU

Comment on lines +267 to +277
auto out_idx = hashmap[hash];
// linear probing with invalid_index sentinel to avoid infinite loop
while (out_idx >= 0 && local_cols[out_idx] != col) {
hash++;
if (hash >= hashmap_size) {
hash = 0;
}
GKO_ASSERT(hashmap[hash] >= 0);
out_idx = hashmap[hash];
GKO_ASSERT(hashmap[hash] < row_nnz);
}
const auto out_idx = hashmap[hash];
// out_idx is either correct or invalid_index, the hashmap sentinel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused here. With this change, lookup_hash_unsafe is the same as lookup_hash (except for one extra assertion for unsafe -- the >=0 -- not sure why?). So...lookup_hash_unsafe isn't really unsafe anymore? Why do we need any changes to unsafe at all, when in the factorization, we switch to use the safe lookup if requested? (lookup[col] rather than lookup.lookup_unsafe(col))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup_hash_unsafe is mostly equal to lookup_hash up to one assert as you mentioned.
in my mind. when you call lookup_unsafe, you will assume it always returns a valid value back without any checks.
when calling lookup, you will always checks whether it is valid or not.
I keep them because of the extra assert. unsafe should report the index is invalid in debug mode because it may not be checked in the code.
I want to change the unsafe lookup because the infinite loop leads debug difficult and this change does not affect the performance.
BTW, I do not think there's performance difference between safe and unsafe version, but I will keep this behavior first.
Then, we can see whether to use safe-only lookup.

Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mike, I'm happy to see how simple this was to extend to ILU. Some design and naming suggestions I'd like to see discussed, and maybe we should build a roadmap for the future ILU preconditioner interface.

@@ -31,10 +31,25 @@ GKO_REGISTER_OPERATION(initialize_l, factorization::initialize_l);

template <typename ValueType, typename IndexType>
std::unique_ptr<Factorization<ValueType, IndexType>>
Factorization<ValueType, IndexType>::unpack() const
Factorization<ValueType, IndexType>::unpack(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to improve the strategy type with 2.0 (or earlier), and the strategies have no impact on triangular solvers, so I think we should avoid adding new strategy-related functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think 2.0 happens soon
if we do not want to introduce here, I can only unpack them by original code path not the unpack them.
@nbeams it should be fine, right? the unpacking codes are the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean revert back to before 57c0051? It should be fine, as far as I know. I tested it before this commit, anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@MarcelKoch MarcelKoch added this to the Ginkgo 1.9.0 milestone Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:factorization This is related to the Factorizations type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants