-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When using
_sage_qk_int8_pv_fp8_cuda_sm90 as the attention backend on WAN2.2 I2V I notice that the output is broken:
It works fine with _flash_3_hub and _sage_qk_int8_pv_fp16_cuda
Does it matter whether we use the original fp16 model? Do we need the fp8 model or should this just work out of the box?
Can somebody explain the type of artifacts we have? Does it point to a specific issue?
Reproduction
import torch
import numpy as np
from diffusers import WanPipeline, AutoencoderKLWan
from diffusers.utils import export_to_video, load_image
dtype = torch.bfloat16
device = "cuda"
vae = AutoencoderKLWan.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers", subfolder="vae", torch_dtype=torch.float32)
pipe = WanPipeline.from_pretrained("Wan-AI/Wan2.2-T2V-A14B-Diffusers", vae=vae, torch_dtype=dtype)
pipe.transformer.set_attention_backend("_sage_qk_int8_pv_fp8_cuda_sm90")
pipe.to(device)
height = 720
width = 1280
prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
negative_prompt = "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走"
output = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
height=height,
width=width,
num_frames=81,
guidance_scale=4.0,
guidance_scale_2=3.0,
num_inference_steps=40,
).frames[0]
export_to_video(output, "t2v_out.mp4", fps=16)
Logs
System Info
diffusers = 0.36.0.dev0
python = 3.11
cuda = 12.8
nvidia driver = 570.195.03
Using H100 80GB HBM3
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working