-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Feature: z-image Turbo Control Net #8679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Pfannkuchensack
wants to merge
25
commits into
invoke-ai:main
Choose a base branch
from
Pfannkuchensack:feature/z-image-control
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Feature: z-image Turbo Control Net #8679
Pfannkuchensack
wants to merge
25
commits into
invoke-ai:main
from
Pfannkuchensack:feature/z-image-control
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add comprehensive support for Z-Image-Turbo (S3-DiT) models including: Backend: - New BaseModelType.ZImage in taxonomy - Z-Image model config classes (ZImageTransformerConfig, Qwen3TextEncoderConfig) - Model loader for Z-Image transformer and Qwen3 text encoder - Z-Image conditioning data structures - Step callback support for Z-Image with FLUX latent RGB factors Invocations: - z_image_model_loader: Load Z-Image transformer and Qwen3 encoder - z_image_text_encoder: Encode prompts using Qwen3 with chat template - z_image_denoise: Flow matching denoising with time-shifted sigmas - z_image_image_to_latents: Encode images to 16-channel latents - z_image_latents_to_image: Decode latents using FLUX VAE Frontend: - Z-Image graph builder for text-to-image generation - Model picker and validation updates for z-image base type - CFG scale now allows 0 (required for Z-Image-Turbo) - Clip skip disabled for Z-Image (uses Qwen3, not CLIP) - Optimal dimension settings for Z-Image (1024x1024) Technical details: - Uses Qwen3 text encoder (not CLIP/T5) - 16 latent channels with FLUX-compatible VAE - Flow matching scheduler with dynamic time shift - 8 inference steps recommended for Turbo variant - bfloat16 inference dtype
Add comprehensive LoRA support for Z-Image models including: Backend: - New Z-Image LoRA config classes (LoRA_LyCORIS_ZImage_Config, LoRA_Diffusers_ZImage_Config) - Z-Image LoRA conversion utilities with key mapping for transformer and Qwen3 encoder - LoRA prefix constants (Z_IMAGE_LORA_TRANSFORMER_PREFIX, Z_IMAGE_LORA_QWEN3_PREFIX) - LoRA detection logic to distinguish Z-Image from Flux models - Layer patcher improvements for proper dtype conversion and parameter
…ntification Move Flux layer structure check before metadata check to prevent misidentifying Z-Image LoRAs (which use `diffusion_model.layers.X`) as Flux AI Toolkit format. Flux models use `double_blocks` and `single_blocks` patterns which are now checked first regardless of metadata presence.
…ibility Add comprehensive support for GGUF quantized Z-Image models and improve component flexibility: Backend: - New Main_GGUF_ZImage_Config for GGUF quantized Z-Image transformers - Z-Image key detection (_has_z_image_keys) to identify S3-DiT models - GGUF quantization detection and sidecar LoRA patching for quantized models - Qwen3Encoder_Qwen3Encoder_Config for standalone Qwen3 encoder models Model Loader: - Split Z-Image model
feat: Add Z-Image ControlNet support with spatial conditioning Add comprehensive ControlNet support for Z-Image models including: Backend: - New ControlNet_Checkpoint_ZImage_Config for Z-Image control adapter models - Z-Image control key detection (_has_z_image_control_keys) to identify control layers - ZImageControlAdapter loader for standalone control models - ZImageControlTransformer2DModel combining base transformer with control layers - Memory-efficient model loading by building combined state dict
…kuchensack/InvokeAI into feat/z-image-turbo-support
Add support for loading Z-Image transformer and Qwen3 encoder models from single-file safetensors format (in addition to existing diffusers directory format). Changes: - Add Main_Checkpoint_ZImage_Config and Main_GGUF_ZImage_Config for single-file Z-Image transformer models - Add Qwen3Encoder_Checkpoint_Config for single-file Qwen3 text encoder - Add ZImageCheckpointModel and ZImageGGUFCheckpointModel loaders with automatic key conversion from original to diffusers format - Add Qwen3EncoderCheckpointLoader using Qwen3ForCausalLM with fast loading via init_empty_weights and proper weight tying for lm_head - Update z_image_denoise to accept Checkpoint format models
Add support for saving and recalling Z-Image component models (VAE and Qwen3 Encoder) in image metadata. Backend: - Add qwen3_encoder field to CoreMetadataInvocation (version 2.1.0) Frontend: - Add vae and qwen3_encoder to Z-Image graph metadata - Add Qwen3EncoderModel metadata handler for recall - Add ZImageVAEModel metadata handler (uses zImageVaeModelSelected instead of vaeSelected to set Z-Image-specific VAE state) - Add qwen3Encoder translation key This enables "Recall Parameters" / "Remix Image" to restore the VAE and Qwen3 Encoder settings used for Z-Image generations.
Add robust device capability detection for bfloat16, replacing hardcoded dtype with runtime checks that fallback to float16/float32 on unsupported hardware. This prevents runtime failures on GPUs and CPUs without bfloat16. Key changes: - Add TorchDevice.choose_bfloat16_safe_dtype() helper for safe dtype selection - Fix LoRA device mismatch in layer_patcher.py (add device= to .to() call) - Replace all assert statements with descriptive exceptions (TypeError/ValueError) - Add hidden_states bounds check and apply_chat_template fallback in text encoder - Add GGUF QKV tensor validation (divisible by 3 check) - Fix CPU noise generation to use float32 for compatibility - Remove verbose debug logging from LoRA conversion utils
…inModelConfig
The FLUX Dev license warning in model pickers used isCheckpointMainModelConfig
incorrectly:
```
isCheckpointMainModelConfig(config) && config.variant === 'dev'
```
This caused a TypeScript error because CheckpointModelConfig type doesn't
include the 'variant' property (it's extracted as `{ type: 'main'; format:
'checkpoint' }` which doesn't narrow to include variant).
Changes:
- Add isFluxDevMainModelConfig type guard that properly checks
base='flux' AND variant='dev', returning MainModelConfig
- Update MainModelPicker and InitialStateMainModelPicker to use new guard
- Remove isCheckpointMainModelConfig as it had no other usages
The function was removed because:
1. It was only used for detecting FLUX Dev models (incorrect use case)
2. No other code needs a generic "is checkpoint format" check
3. The pattern in this codebase is specific type guards per model variant
(isFluxFillMainModelModelConfig, isRefinerMainModelModelConfig, etc.)
…ters - Add Qwen3EncoderGGUFLoader for llama.cpp GGUF quantized text encoders - Convert llama.cpp key format (blk.X., token_embd) to PyTorch format - Handle tied embeddings (lm_head.weight ↔ embed_tokens.weight) - Dequantize embed_tokens for embedding lookups (GGMLTensor limitation) - Add QK normalization key mappings (q_norm, k_norm) for Qwen3 - Set Z-Image defaults: steps=9, cfg_scale=0.0, width/height=1024 - Allow cfg_scale >= 0 (was >= 1) for Z-Image Turbo compatibility - Add GGUF format detection for Qwen3 model probing
…rNorm - Add CustomDiffusersRMSNorm for diffusers.models.normalization.RMSNorm - Add CustomLayerNorm for torch.nn.LayerNorm - Register both in AUTOCAST_MODULE_TYPE_MAPPING Enables partial loading (enable_partial_loading: true) for Z-Image models by wrapping their normalization layers with device autocast support
Fixed the DEFAULT_TOKENIZER_SOURCE to Qwen/Qwen3-4B
VRAM usage is high. - Auto-detect control_in_dim from adapter weights (16 for V1, 33 for V2.0) - Auto-detect n_refiner_layers from state dict - Add zero-padding for V2.0's additional channels - Use accelerate.init_empty_weights() for efficient model creation - Add ControlNet_Checkpoint_ZImage_Config to frontend schema
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
api
backend
PRs that change backend files
frontend
PRs that change frontend files
invocations
PRs that change invocations
python
PRs that change python files
python-deps
PRs that change python dependencies
Root
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add support for Z-Image ControlNet V2.0 alongside the existing V1 support.
Key changes:
control_in_dimfrom adapter weights (16 for V1, 33 for V2.0)n_refiner_layersfrom state dictaccelerate.init_empty_weights()for more efficient model creationControlNet_Checkpoint_ZImage_Configto frontend schemaRelated Issues / Discussions
Part of Z-Image feature implementation.
QA Instructions
control_context_scale: 0.65-0.80Merge Plan
Can be merged after review. No special considerations needed.
Checklist
What's Newcopy (if doing a release after this PR)