-
Notifications
You must be signed in to change notification settings - Fork 0
UPSTREAM PR #1222: improve handling of VAE decode failures #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Performance Review Report: Stable Diffusion C++ - VAE Error Handling ImprovementsImpact Classification: ModerateCommit: 6aa3b9d "improve handling of VAE decode failures" Executive SummaryThe target version introduces error handling improvements for VAE decode operations, resulting in 7-8% response time increases (+4,600 nanoseconds absolute) in three GPU-critical functions. The changes prioritize production reliability over raw performance, with negligible user-facing impact (<0.001% of total generation time). Performance-Critical Functions Impacted1. StableDiffusionGGML::preview_image() [both binaries]
2. StableDiffusionGGML::decode_first_stage() [sd-cli]
3. UpscalerGGML::upscale() [sd-cli]
Code Changes and JustificationAll three functions received identical error handling enhancements:
These changes prevent silent GPU failures, enable graceful degradation under memory constraints, and ensure proper resource cleanup—critical for production ML inference systems. Cumulative Impact AnalysisTypical Generation (20 steps):
Power Consumption: Negligible increase (<0.01W), imperceptible in 50-150W GPU inference workloads. Additional FindingsEleven STL functions show compiler-driven optimizations (no source changes) with 23-75% improvements in various metrics, providing distributed performance gains that partially offset error handling overhead. Notable: std::vector operations improved 40-75%, std::unordered_map::find() gained 44% throughput. ConclusionThe 4.6 microsecond overhead per GPU operation is justified by substantial reliability improvements. The changes prevent crashes, memory leaks, and silent failures while adding sub-millisecond latency to multi-second operations. This represents mature engineering prioritizing correctness in production ML systems. |
|
@loci-dev whats the most substantial change in this PR |
b9cb3c1 to
e31dd7d
Compare
|
@loci-dev whats the most substantial change in this PR |
|
Based on the performance review report, the most substantial change in this PR is the error handling improvements for VAE (Variational Autoencoder) decode operations. Here's what makes this the most substantial change: Core ImpactThe PR adds comprehensive error handling to three GPU-critical functions that were previously vulnerable to silent failures:
What ChangedEach function received:
Why It's SubstantialBefore this PR: GPU compute failures could occur silently, leading to:
After this PR: The system can:
This is a reliability and production-readiness improvement that trades ~4.6 microseconds per operation for robust error handling in GPU-intensive ML inference—a worthwhile trade-off for any production system. Would you like me to dive deeper into any specific function or examine the actual code changes? |
Mirrored from leejet/stable-diffusion.cpp#1222
for cases like leejet/stable-diffusion.cpp#1220
The cli should no longer save an empty image to disk if the generation failed. If no imge is sucessfuly saved, it should exit with error code -1 instead of 0. (Maybe I should change it to return -1 if any image failed to be saved?)