-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add interleaving to sgemm and dgemm. Disentangle trmm/symm from gemm. #5573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
@martin-frbg @Mousius can you please have a look? |
…emm. Co-authored-by: Chris Sidebottom <[email protected]>
…ng of copyright notices added in last commit.
8316dc1 to
0a205ee
Compare
|
Thanks for looking into this - though I'm not immediately convinced that the speedup warrants the added complexity @Mousius ? At least I'd like to put this off until after the 0.3.31 release. |
The speedup is one thing, the other is enabling GEMM kernels without the need to also implement the other kernels. This would enable us to land existing SME kernels, as previously proposed here: #5011 (comment) Putting off until after 0.3.31 makes perfect sense to me as it's a relatively high risk change. |
Hi Martin. Thanks for taking a look. I'm happy for this to go in after the release. Do you have a rough estimate for when that might be, so I can share internally? |
This change adds interleaving to sgemm and dgemm copies and kernels for ARMV8SVE.
This required a degree of disentangling symm and trmm kernels from gemm. It should now be much easier to apply further optimisations to gemm.
The addition of interleaving provides a ~1.4% speedup on c7g (V1), with negligible changes on c8g (V2).
Taken over square matrix operations with size 2->2014, stepsize = 1:
Geometric mean for interleave/c7g_dgemm.txt: 0.9859023206257058
Geometric mean for interleave/c7g_sgemm.txt: 0.9887890902680289
Geometric mean for interleave/c8g_dgemm.txt: 0.9970050554316875
Geometric mean for interleave/c8g_sgemm.txt: 0.9948135816755502
We see an increase in the sgemm speedup (~2.4%) on c7g for larger matrix sizes.
Taken over square matrix operations with size 2,000->10,000, stepsize = 1,000:
Geometric mean for 64thread_interleave/c7g_dgemm.txt: 0.9865252964543917
Geometric mean for 64thread_interleave/c7g_sgemm.txt: 0.9762227312411808
Geometric mean for 64thread_interleave/c8g_dgemm.txt: 0.9997186302044462
Geometric mean for 64thread_interleave/c8g_sgemm.txt: 0.9996022927667269