Skip to content

Conversation

@koreaygj
Copy link

@koreaygj koreaygj commented Dec 1, 2025

Hi
I try to implemente a fast_math() method to solve this issue. It provides a convenient way to enable fast math approximations globally, equivalent to NVCC's --use_fast_math option.

Description

Implement a global fast_math flag for CudaBuilder as a convenience method equivalent to NVCC's --use_fast_math option.

According to the NVCC official documentation, the --use_fast_math flag enables fast approximations for floating-point operations by internally setting ftz, prec-sqrt, and prec-div.

image

Implementation

I implemented a fast_math() method that sets these three options internally:

pub fn fast_math(mut self, fast_math: bool) -> Self {
    if fast_math {
        self.ftz = true;
        self.fast_sqrt = true;
        self.fast_div = true;
        self.fma_contraction = true;
    }
    self
}

Usage

Users can now enable fast math globally with a single method call:

CudaBuilder::new("kernel")
    .fast_math(true)
    .build()?

Instead of manually setting each flag:

CudaBuilder::new("kernel")
    .ftz(true)
    .fast_sqrt(true)
    .fast_div(true)
    .build()?

Implementation Details

  • The fast_math() method is a convenience wrapper that internally enables ftz, fast_sqrt, and fast_div
  • Leverages existing LLVM argument generation in invoke_rustc()
  • No changes to the rustc_codegen_nvvm backend required
  • Maintains backward compatibility (default is false)

Relates to #262

- Add fast_math field to CudaBuilder struct
- Enables ftz, fast_sqrt, fast_div and fma_contraction(fmad) internally
- Provides convenient parity with NVCC's --use_fast_math
Copy link
Collaborator

@nnethercote nnethercote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is confused. How do the individual flags interact with the new flag? What happens if you specify fast_math and then also fast_sqrt(false)? Seems like either:

  • CudaBuilder should handle all the fast_math stuff, and pass individual flags to NVCC, but not --use_fast_math
  • CudaBuilder should just record fast_math and then pass --use_fast_math through to NVCC

But currently it's doing a mixture of both.

Also, was this PR generated by AI? The description is very long, with subheadings and bullet points that are typical for AI.

self.ftz = true;
self.fast_sqrt = true;
self.fast_div = true;
self.fma_contraction = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a redundant representation. You're storing fast_math and also the individual flags. It should do one or the other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, nothing happens if fast_math is false. Is that intended?


if builder.fast_math {
llvm_args.push("--use_fast_math".to_string());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be plumbed through to NVCC, or should it be handled by CudaBuilder which then specifies the individual flags?

@koreaygj
Copy link
Author

koreaygj commented Dec 2, 2025

Thank you for your review.
I used AI to generate the PR description, next time I write myself...

Do you think delegating to NVCC would be the better approach? I need your advices.
Instead of CudaBuilder managing the individual flags, we could just pass --use_fast_math directly to NVCC and let it handle the optimizations.

@LegNeato
Copy link
Contributor

LegNeato commented Dec 5, 2025

How does this jive with https://simonbyrne.github.io/notes/fastmath/ ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants