-
Notifications
You must be signed in to change notification settings - Fork 217
feat: Add global fast_math flag to CudaBuilder #331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add fast_math field to CudaBuilder struct - Enables ftz, fast_sqrt, fast_div and fma_contraction(fmad) internally - Provides convenient parity with NVCC's --use_fast_math
nnethercote
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is confused. How do the individual flags interact with the new flag? What happens if you specify fast_math and then also fast_sqrt(false)? Seems like either:
- CudaBuilder should handle all the fast_math stuff, and pass individual flags to NVCC, but not
--use_fast_math - CudaBuilder should just record fast_math and then pass
--use_fast_maththrough to NVCC
But currently it's doing a mixture of both.
Also, was this PR generated by AI? The description is very long, with subheadings and bullet points that are typical for AI.
| self.ftz = true; | ||
| self.fast_sqrt = true; | ||
| self.fast_div = true; | ||
| self.fma_contraction = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a redundant representation. You're storing fast_math and also the individual flags. It should do one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, nothing happens if fast_math is false. Is that intended?
|
|
||
| if builder.fast_math { | ||
| llvm_args.push("--use_fast_math".to_string()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be plumbed through to NVCC, or should it be handled by CudaBuilder which then specifies the individual flags?
|
Thank you for your review. Do you think delegating to NVCC would be the better approach? I need your advices. |
|
How does this jive with https://simonbyrne.github.io/notes/fastmath/ ? |
Hi
I try to implemente a
fast_math()method to solve this issue. It provides a convenient way to enable fast math approximations globally, equivalent to NVCC's--use_fast_mathoption.Description
Implement a global
fast_mathflag for CudaBuilder as a convenience method equivalent to NVCC's--use_fast_mathoption.According to the NVCC official documentation, the
--use_fast_mathflag enables fast approximations for floating-point operations by internally settingftz,prec-sqrt, andprec-div.Implementation
I implemented a
fast_math()method that sets these three options internally:Usage
Users can now enable fast math globally with a single method call:
Instead of manually setting each flag:
Implementation Details
fast_math()method is a convenience wrapper that internally enablesftz,fast_sqrt, andfast_divinvoke_rustc()false)Relates to #262