Skip to content

[Bug] Windows CUDA build only contains sm_90 kernel #1061

@CarlGao4

Description

@CarlGao4

Git commit

2f0bd31

Operating System & Version

Windows

GGML backends

CUDA

Command-line arguments used

Not relative

Steps to reproduce

I ran cuobjdump --list-elf stable-diffusion.dll and get the results:

Only sm_90 kernel
ELF file    1: stable-diffusion.1.sm_90.cubin
ELF file    2: stable-diffusion.2.sm_90.cubin
ELF file    3: stable-diffusion.3.sm_90.cubin
ELF file    4: stable-diffusion.4.sm_90.cubin
ELF file    5: stable-diffusion.5.sm_90.cubin
ELF file    6: stable-diffusion.6.sm_90.cubin
ELF file    7: stable-diffusion.7.sm_90.cubin
ELF file    8: stable-diffusion.8.sm_90.cubin
ELF file    9: stable-diffusion.9.sm_90.cubin
ELF file   10: stable-diffusion.10.sm_90.cubin
ELF file   11: stable-diffusion.11.sm_90.cubin
ELF file   12: stable-diffusion.12.sm_90.cubin
ELF file   13: stable-diffusion.13.sm_90.cubin
ELF file   14: stable-diffusion.14.sm_90.cubin
ELF file   15: stable-diffusion.15.sm_90.cubin
ELF file   16: stable-diffusion.16.sm_90.cubin
ELF file   17: stable-diffusion.17.sm_90.cubin
ELF file   18: stable-diffusion.18.sm_90.cubin
ELF file   19: stable-diffusion.19.sm_90.cubin
ELF file   20: stable-diffusion.20.sm_90.cubin
ELF file   21: stable-diffusion.21.sm_90.cubin
ELF file   22: stable-diffusion.22.sm_90.cubin
ELF file   23: stable-diffusion.23.sm_90.cubin
ELF file   24: stable-diffusion.24.sm_90.cubin
ELF file   25: stable-diffusion.25.sm_90.cubin
ELF file   26: stable-diffusion.26.sm_90.cubin
ELF file   27: stable-diffusion.27.sm_90.cubin
ELF file   28: stable-diffusion.28.sm_90.cubin
ELF file   29: stable-diffusion.29.sm_90.cubin
ELF file   30: stable-diffusion.30.sm_90.cubin
ELF file   31: stable-diffusion.31.sm_90.cubin
ELF file   32: stable-diffusion.32.sm_90.cubin
ELF file   33: stable-diffusion.33.sm_90.cubin
ELF file   34: stable-diffusion.34.sm_90.cubin
ELF file   35: stable-diffusion.35.sm_90.cubin
ELF file   36: stable-diffusion.36.sm_90.cubin
ELF file   37: stable-diffusion.37.sm_90.cubin
ELF file   38: stable-diffusion.38.sm_90.cubin
ELF file   39: stable-diffusion.39.sm_90.cubin
ELF file   40: stable-diffusion.40.sm_90.cubin
ELF file   41: stable-diffusion.41.sm_90.cubin
ELF file   42: stable-diffusion.42.sm_90.cubin
ELF file   43: stable-diffusion.43.sm_90.cubin
ELF file   44: stable-diffusion.44.sm_90.cubin
ELF file   45: stable-diffusion.45.sm_90.cubin
ELF file   46: stable-diffusion.46.sm_90.cubin
ELF file   47: stable-diffusion.47.sm_90.cubin
ELF file   48: stable-diffusion.48.sm_90.cubin
ELF file   49: stable-diffusion.49.sm_90.cubin
ELF file   50: stable-diffusion.50.sm_90.cubin
ELF file   51: stable-diffusion.51.sm_90.cubin
ELF file   52: stable-diffusion.52.sm_90.cubin
ELF file   53: stable-diffusion.53.sm_90.cubin
ELF file   54: stable-diffusion.54.sm_90.cubin
ELF file   55: stable-diffusion.55.sm_90.cubin
ELF file   56: stable-diffusion.56.sm_90.cubin
ELF file   57: stable-diffusion.57.sm_90.cubin
ELF file   58: stable-diffusion.58.sm_90.cubin
ELF file   59: stable-diffusion.59.sm_90.cubin
ELF file   60: stable-diffusion.60.sm_90.cubin
ELF file   61: stable-diffusion.61.sm_90.cubin
ELF file   62: stable-diffusion.62.sm_90.cubin
ELF file   63: stable-diffusion.63.sm_90.cubin
ELF file   64: stable-diffusion.64.sm_90.cubin
ELF file   65: stable-diffusion.65.sm_90.cubin
ELF file   66: stable-diffusion.66.sm_90.cubin
ELF file   67: stable-diffusion.67.sm_90.cubin
ELF file   68: stable-diffusion.68.sm_90.cubin
ELF file   69: stable-diffusion.69.sm_90.cubin
ELF file   70: stable-diffusion.70.sm_90.cubin
ELF file   71: stable-diffusion.71.sm_90.cubin
ELF file   72: stable-diffusion.72.sm_90.cubin
ELF file   73: stable-diffusion.73.sm_90.cubin
ELF file   74: stable-diffusion.74.sm_90.cubin
ELF file   75: stable-diffusion.75.sm_90.cubin
ELF file   76: stable-diffusion.76.sm_90.cubin
ELF file   77: stable-diffusion.77.sm_90.cubin
ELF file   78: stable-diffusion.78.sm_90.cubin
ELF file   79: stable-diffusion.79.sm_90.cubin
ELF file   80: stable-diffusion.80.sm_90.cubin
ELF file   81: stable-diffusion.81.sm_90.cubin
ELF file   82: stable-diffusion.82.sm_90.cubin
ELF file   83: stable-diffusion.83.sm_90.cubin
ELF file   84: stable-diffusion.84.sm_90.cubin
ELF file   85: stable-diffusion.85.sm_90.cubin
ELF file   86: stable-diffusion.86.sm_90.cubin
ELF file   87: stable-diffusion.87.sm_90.cubin
ELF file   88: stable-diffusion.88.sm_90.cubin
ELF file   89: stable-diffusion.89.sm_90.cubin
ELF file   90: stable-diffusion.90.sm_90.cubin
ELF file   91: stable-diffusion.91.sm_90.cubin
ELF file   92: stable-diffusion.92.sm_90.cubin
ELF file   93: stable-diffusion.93.sm_90.cubin
ELF file   94: stable-diffusion.94.sm_90.cubin
ELF file   95: stable-diffusion.95.sm_90.cubin
ELF file   96: stable-diffusion.96.sm_90.cubin
ELF file   97: stable-diffusion.97.sm_90.cubin
ELF file   98: stable-diffusion.98.sm_90.cubin
ELF file   99: stable-diffusion.99.sm_90.cubin
ELF file  100: stable-diffusion.100.sm_90.cubin
ELF file  101: stable-diffusion.101.sm_90.cubin
ELF file  102: stable-diffusion.102.sm_90.cubin
ELF file  103: stable-diffusion.103.sm_90.cubin
ELF file  104: stable-diffusion.104.sm_90.cubin
ELF file  105: stable-diffusion.105.sm_90.cubin
ELF file  106: stable-diffusion.106.sm_90.cubin
ELF file  107: stable-diffusion.107.sm_90.cubin
ELF file  108: stable-diffusion.108.sm_90.cubin
ELF file  109: stable-diffusion.109.sm_90.cubin
ELF file  110: stable-diffusion.110.sm_90.cubin
ELF file  111: stable-diffusion.111.sm_90.cubin
ELF file  112: stable-diffusion.112.sm_90.cubin
ELF file  113: stable-diffusion.113.sm_90.cubin
ELF file  114: stable-diffusion.114.sm_90.cubin
ELF file  115: stable-diffusion.115.sm_90.cubin
ELF file  116: stable-diffusion.116.sm_90.cubin
ELF file  117: stable-diffusion.117.sm_90.cubin
ELF file  118: stable-diffusion.118.sm_90.cubin

So only H100 and H200 can use the GPU build here.

What you expected to happen

All supported Nvidia GPUs with built targets (90 89 86 80 75) should be supported

- build: "cuda12"
defines: "-DSD_CUDA=ON -DSD_BUILD_SHARED_LIBS=ON -DCMAKE_CUDA_ARCHITECTURES=90;89;86;80;75"

What actually happened

Running on RTX 40 series (8.9) will cause:

[ERROR] ggml_extend.hpp:75   - ggml_cuda_compute_forward: GET_ROWS failed
[ERROR] ggml_extend.hpp:75   - CUDA error: no kernel image is available for execution on the device
[ERROR] ggml_extend.hpp:75   -   current device: 0, in function ggml_cuda_compute_forward at D:\a\stable-diffusion.cpp\stable-diffusion.cpp\ggml\src\ggml-cuda\ggml-cuda.cu:2540
[ERROR] ggml_extend.hpp:75   -   err

Logs / error messages / stack trace

No response

Additional context / environment details

This issue also related with:
#1060
#1020 (comment)
#851 (comment)
#554

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions