-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD in C# #9
Comments
Thanks for your offer. I am interested in your suggestions and open to integrate it here. I did not use C# in the past years myself, so I am not up to date with latest features of the ecosystem. Therefore, I unfortunately cannot offer you much guidance or review on your work. This should not block merging your changes but if you know someone who could offer to review, in my experience this usually improves the end-result significantly even for the most-skilled developers. What is important to me is that it stays lightweight and cross-platform (Linux, Windows, ...). |
Oh cool. Sure, it should stay lightweight, that's the whole point 😂 In the latest versions, C# finally got some options to use common SIMD operations cross-platform without caring about the actual hardware. I think it even defaults to a slow version when the chipset doesn't support the operation, etc. It's much easier to make use of SIMD and low-level pointer hacks in C# now. Otherwise I wouldn't consider such a change. (See my project Schafkopf) I'm not looking for big changes. It's basically some intrinsic for 4x4 matrix multiplication and some instructions (add, subtract, multiply) for computing 4-8 floats of a loop cycle at once and of course some intrinsics to load/store data. Having had a look at your code, it's already structured such that only a few hotspots will be affected, but most of the logic will stay the same. But yeah, I think the first thing will be to add proper performance benchmarks to obtain a baseline and then expand from there. I've already done a C implementation of a matmul (just SIMD without Strassen algo), so I'll compare it with that and evaluate the potential gains, then figure out if it's worth it. Currently, I'm working on other things though. Maybe the PR will delay by a few weeks from now. |
I've sent my first PR with enhancements such as a proper project / solution setup (cross-platform) and some tests / benchmarks continuously run by a GitHub workflow. There's actually a bug when I'm trying to multiply 2 random 64x64 matrices during the benchmark. This needs to be investigated. |
Which matrices did you test btw? Your Strassen algo always crashes with a null pointer exception 😅 |
To be honest, I don't even remember what I used this project for nine years ago. Probably some tests at university. I guess I didn't use the Strassen algo at all, just some other functions as the Strassen algo was directly imported from https://blog.ivank.net/lightweight-matrix-class-in-c-strassen-algorithm-lu-decomposition.html. Maybe I also broke it during some refactorings. Might make sense to compare it to the original implementation. |
I have been thinking a little bit more. As you do all the work and I am currently quite far away from C#, maybe it makes sense that you fork the project, and I archive this one and add a note to use your fork to the What do you think? |
Honestly, I think it doesn't make sense to continue the project via PRs, but thank you for your time anyways. I already have a little AI framework coded up in C and I just need a faster matmul for big matrices. But as your Strassen implementation doesn't work, I could start over entirely or pick another implementation that actually works. I don't really know whether I should pick C or C#. In case I decide to stay in C#, I can continue your project on my fork. That's actually a good idea. I'll let you know in case I decide to maintain the fork. |
Hey, this codebase looks cool!
I'd like to enhance your project with SIMD and some benchmarks such that it becomes suitable for processing small neural nets in .NET.
Somehow people prefer to call into Python code instead of implementing a fast matmul from scratch in .Net. It's really weird so I wanna do something about that.
Are you open for such changes?
The text was updated successfully, but these errors were encountered: