Feature: z-image Turbo Control Net #8679

Pfannkuchensack · 2025-12-14T09:25:11Z

Summary

Add support for Z-Image ControlNet V2.0 alongside the existing V1 support.

Key changes:

Auto-detect control_in_dim from adapter weights (16 for V1, 33 for V2.0)
Auto-detect n_refiner_layers from state dict
Add zero-padding for V2.0's additional control channels (diffusers approach)
Use accelerate.init_empty_weights() for more efficient model creation
Add ControlNet_Checkpoint_ZImage_Config to frontend schema

Related Issues / Discussions

Part of Z-Image feature implementation.

QA Instructions

Load a Z-Image ControlNet V1 model (control_in_dim=16) and verify it works
Load a Z-Image ControlNet V2.0 model (control_in_dim=33) and verify it works
Test with different control types: Canny, Depth, Pose
Recommended control_context_scale: 0.65-0.80

Merge Plan

Can be merged after review. No special considerations needed.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

Add comprehensive support for Z-Image-Turbo (S3-DiT) models including: Backend: - New BaseModelType.ZImage in taxonomy - Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig) - Model loader for Z-Image transformer and Qwen3 text encoder - Z-Image conditioning data structures - Step callback support for Z-Image with FLUX latent RGB factors Invocations: - z_image_model_loader: Load Z-Image transformer and Qwen3 encoder - z_image_text_encoder: Encode prompts using Qwen3 with chat template - z_image_denoise: Flow matching denoising with time-shifted sigmas - z_image_image_to_latents: Encode images to 16-channel latents - z_image_latents_to_image: Decode latents using FLUX VAE Frontend: - Z-Image graph builder for text-to-image generation - Model picker and validation updates for z-image base type - CFG scale now allows 0 (required for Z-Image-Turbo) - Clip skip disabled for Z-Image (uses Qwen3, not CLIP) - Optimal dimension settings for Z-Image (1024x1024) Technical details: - Uses Qwen3 text encoder (not CLIP/T5) - 16 latent channels with FLUX-compatible VAE - Flow matching scheduler with dynamic time shift - 8 inference steps recommended for Turbo variant - bfloat16 inference dtype

Add comprehensive LoRA support for Z-Image models including: Backend: - New Z-Image LoRA config classes (LoRA_LyCORIS_ZImage_Config, LoRA_Diffusers_ZImage_Config) - Z-Image LoRA conversion utilities with key mapping for transformer and Qwen3 encoder - LoRA prefix constants (Z_IMAGE_LORA_TRANSFORMER_PREFIX, Z_IMAGE_LORA_QWEN3_PREFIX) - LoRA detection logic to distinguish Z-Image from Flux models - Layer patcher improvements for proper dtype conversion and parameter

…ntification Move Flux layer structure check before metadata check to prevent misidentifying Z-Image LoRAs (which use `diffusion_model.layers.X`) as Flux AI Toolkit format. Flux models use `double_blocks` and `single_blocks` patterns which are now checked first regardless of metadata presence.

…ibility Add comprehensive support for GGUF quantized Z-Image models and improve component flexibility: Backend: - New Main_GGUF_ZImage_Config for GGUF quantized Z-Image transformers - Z-Image key detection (_has_z_image_keys) to identify S3-DiT models - GGUF quantization detection and sidecar LoRA patching for quantized models - Qwen3Encoder_Qwen3Encoder_Config for standalone Qwen3 encoder models Model Loader: - Split Z-Image model

feat: Add Z-Image ControlNet support with spatial conditioning Add comprehensive ControlNet support for Z-Image models including: Backend: - New ControlNet_Checkpoint_ZImage_Config for Z-Image control adapter models - Z-Image control key detection (_has_z_image_control_keys) to identify control layers - ZImageControlAdapter loader for standalone control models - ZImageControlTransformer2DModel combining base transformer with control layers - Memory-efficient model loading by building combined state dict

…kuchensack/InvokeAI into feat/z-image-turbo-support

Add support for loading Z-Image transformer and Qwen3 encoder models from single-file safetensors format (in addition to existing diffusers directory format). Changes: - Add Main_Checkpoint_ZImage_Config and Main_GGUF_ZImage_Config for single-file Z-Image transformer models - Add Qwen3Encoder_Checkpoint_Config for single-file Qwen3 text encoder - Add ZImageCheckpointModel and ZImageGGUFCheckpointModel loaders with automatic key conversion from original to diffusers format - Add Qwen3EncoderCheckpointLoader using Qwen3ForCausalLM with fast loading via init_empty_weights and proper weight tying for lm_head - Update z_image_denoise to accept Checkpoint format models

Add support for saving and recalling Z-Image component models (VAE and Qwen3 Encoder) in image metadata. Backend: - Add qwen3_encoder field to CoreMetadataInvocation (version 2.1.0) Frontend: - Add vae and qwen3_encoder to Z-Image graph metadata - Add Qwen3EncoderModel metadata handler for recall - Add ZImageVAEModel metadata handler (uses zImageVaeModelSelected instead of vaeSelected to set Z-Image-specific VAE state) - Add qwen3Encoder translation key This enables "Recall Parameters" / "Remix Image" to restore the VAE and Qwen3 Encoder settings used for Z-Image generations.

Add robust device capability detection for bfloat16, replacing hardcoded dtype with runtime checks that fallback to float16/float32 on unsupported hardware. This prevents runtime failures on GPUs and CPUs without bfloat16. Key changes: - Add TorchDevice.choose_bfloat16_safe_dtype() helper for safe dtype selection - Fix LoRA device mismatch in layer_patcher.py (add device= to .to() call) - Replace all assert statements with descriptive exceptions (TypeError/ValueError) - Add hidden_states bounds check and apply_chat_template fallback in text encoder - Add GGUF QKV tensor validation (divisible by 3 check) - Fix CPU noise generation to use float32 for compatibility - Remove verbose debug logging from LoRA conversion utils

…inModelConfig The FLUX Dev license warning in model pickers used isCheckpointMainModelConfig incorrectly: ``` isCheckpointMainModelConfig(config) && config.variant === 'dev' ``` This caused a TypeScript error because CheckpointModelConfig type doesn't include the 'variant' property (it's extracted as `{ type: 'main'; format: 'checkpoint' }` which doesn't narrow to include variant). Changes: - Add isFluxDevMainModelConfig type guard that properly checks base='flux' AND variant='dev', returning MainModelConfig - Update MainModelPicker and InitialStateMainModelPicker to use new guard - Remove isCheckpointMainModelConfig as it had no other usages The function was removed because: 1. It was only used for detecting FLUX Dev models (incorrect use case) 2. No other code needs a generic "is checkpoint format" check 3. The pattern in this codebase is specific type guards per model variant (isFluxFillMainModelModelConfig, isRefinerMainModelModelConfig, etc.)

…ters - Add Qwen3EncoderGGUFLoader for llama.cpp GGUF quantized text encoders - Convert llama.cpp key format (blk.X., token_embd) to PyTorch format - Handle tied embeddings (lm_head.weight ↔ embed_tokens.weight) - Dequantize embed_tokens for embedding lookups (GGMLTensor limitation) - Add QK normalization key mappings (q_norm, k_norm) for Qwen3 - Set Z-Image defaults: steps=9, cfg_scale=0.0, width/height=1024 - Allow cfg_scale >= 0 (was >= 1) for Z-Image Turbo compatibility - Add GGUF format detection for Qwen3 model probing

…rNorm - Add CustomDiffusersRMSNorm for diffusers.models.normalization.RMSNorm - Add CustomLayerNorm for torch.nn.LayerNorm - Register both in AUTOCAST_MODULE_TYPE_MAPPING Enables partial loading (enable_partial_loading: true) for Z-Image models by wrapping their normalization layers with device autocast support

…dont.

Fixed the DEFAULT_TOKENIZER_SOURCE to Qwen/Qwen3-4B

VRAM usage is high. - Auto-detect control_in_dim from adapter weights (16 for V1, 33 for V2.0) - Auto-detect n_refiner_layers from state dict - Add zero-padding for V2.0's additional channels - Use accelerate.init_empty_weights() for efficient model creation - Add ControlNet_Checkpoint_ZImage_Config to frontend schema

Pfannkuchensack added 25 commits December 1, 2025 00:22

fix windows path again.

13ac16e

Fix windows path again again

eaf4742

Fix windows path again again again...

66729ea

Merge branch 'main' into feat/z-image-turbo-support

9f6d04c

fix for the typegen-checks

4a1710b

Merge branch 'feat/z-image-turbo-support' of https://github.com/Pfann…

b28d58b

…kuchensack/InvokeAI into feat/z-image-turbo-support

Patch from @lstein for the update of diffusers

2e0cd4d

fix typegen wrong

3e862ce

fix typegen

8551ff8

z-image-turbo-fp8-e5m2 works. the z-image-turbo_fp8_scaled_e4m3fn_KJ …

f9605e1

…dont.

Remove the ParamScheduler for z-images

3ee24cb

Fixed the DEFAULT_TOKENIZER_SOURCE to Qwen/Qwen3-4B

Remove unneeded Loggging

1e72feb

Merge branch 'feat/z-image-turbo-support' into feature/z-image-control

e211ac9

github-actions bot added api python PRs that change python files Root invocations PRs that change invocations backend PRs that change backend files labels Dec 14, 2025

github-actions bot added frontend PRs that change frontend files python-deps PRs that change python dependencies labels Dec 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: z-image Turbo Control Net #8679

Feature: z-image Turbo Control Net #8679

Uh oh!

Pfannkuchensack commented Dec 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature: z-image Turbo Control Net #8679

Are you sure you want to change the base?

Feature: z-image Turbo Control Net #8679

Uh oh!

Conversation

Pfannkuchensack commented Dec 14, 2025

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant