feat: Add global fast_math flag to CudaBuilder #331

koreaygj · 2025-12-01T10:05:34Z

Hi
I try to implemente a fast_math() method to solve this issue. It provides a convenient way to enable fast math approximations globally, equivalent to NVCC's --use_fast_math option.

Description

Implement a global fast_math flag for CudaBuilder as a convenience method equivalent to NVCC's --use_fast_math option.

According to the NVCC official documentation, the --use_fast_math flag enables fast approximations for floating-point operations by internally setting ftz, prec-sqrt, and prec-div.

Implementation

I implemented a fast_math() method that sets these three options internally:

pub fn fast_math(mut self, fast_math: bool) -> Self {
    if fast_math {
        self.ftz = true;
        self.fast_sqrt = true;
        self.fast_div = true;
        self.fma_contraction = true;
    }
    self
}

Usage

Users can now enable fast math globally with a single method call:

CudaBuilder::new("kernel")
    .fast_math(true)
    .build()?

Instead of manually setting each flag:

CudaBuilder::new("kernel")
    .ftz(true)
    .fast_sqrt(true)
    .fast_div(true)
    .build()?

Implementation Details

The fast_math() method is a convenience wrapper that internally enables ftz, fast_sqrt, and fast_div
Leverages existing LLVM argument generation in invoke_rustc()
No changes to the rustc_codegen_nvvm backend required
Maintains backward compatibility (default is false)

Relates to #262

- Add fast_math field to CudaBuilder struct - Enables ftz, fast_sqrt, fast_div and fma_contraction(fmad) internally - Provides convenient parity with NVCC's --use_fast_math

nnethercote

The implementation is confused. How do the individual flags interact with the new flag? What happens if you specify fast_math and then also fast_sqrt(false)? Seems like either:

CudaBuilder should handle all the fast_math stuff, and pass individual flags to NVCC, but not --use_fast_math
CudaBuilder should just record fast_math and then pass --use_fast_math through to NVCC

But currently it's doing a mixture of both.

Also, was this PR generated by AI? The description is very long, with subheadings and bullet points that are typical for AI.

nnethercote · 2025-12-02T00:25:55Z

crates/cuda_builder/src/lib.rs

+            self.ftz = true;
+            self.fast_sqrt = true;
+            self.fast_div = true;
+            self.fma_contraction = true;


This is a redundant representation. You're storing fast_math and also the individual flags. It should do one or the other.

Also, nothing happens if fast_math is false. Is that intended?

nnethercote · 2025-12-02T00:27:24Z

crates/cuda_builder/src/lib.rs


+    if builder.fast_math {
+        llvm_args.push("--use_fast_math".to_string());
+    }


Should this be plumbed through to NVCC, or should it be handled by CudaBuilder which then specifies the individual flags?

koreaygj · 2025-12-02T01:51:54Z

Thank you for your review.
I used AI to generate the PR description, next time I write myself...

Do you think delegating to NVCC would be the better approach? I need your advices.
Instead of CudaBuilder managing the individual flags, we could just pass --use_fast_math directly to NVCC and let it handle the optimizations.

LegNeato · 2025-12-05T03:04:26Z

How does this jive with https://simonbyrne.github.io/notes/fastmath/ ?

feat: Add global fast_math flag to CudaBuilder

888373b

- Add fast_math field to CudaBuilder struct - Enables ftz, fast_sqrt, fast_div and fma_contraction(fmad) internally - Provides convenient parity with NVCC's --use_fast_math

nnethercote requested changes Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add global fast_math flag to CudaBuilder #331

feat: Add global fast_math flag to CudaBuilder #331

Uh oh!

koreaygj commented Dec 1, 2025

Uh oh!

nnethercote left a comment

Uh oh!

nnethercote Dec 2, 2025

Uh oh!

nnethercote Dec 2, 2025

Uh oh!

nnethercote Dec 2, 2025

Uh oh!

koreaygj commented Dec 2, 2025

Uh oh!

LegNeato commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add global fast_math flag to CudaBuilder #331

Are you sure you want to change the base?

feat: Add global fast_math flag to CudaBuilder #331

Uh oh!

Conversation

koreaygj commented Dec 1, 2025

Description

Implementation

Usage

Implementation Details

Uh oh!

nnethercote left a comment

Choose a reason for hiding this comment

Uh oh!

nnethercote Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

nnethercote Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

nnethercote Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

koreaygj commented Dec 2, 2025

Uh oh!

LegNeato commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants