We had a network that structed as a convnet pre-encoder -> DiT blocks -> final block for last sampling, it worked well with torch format and onnx format, but when we tried to convert it into tensorrt fp16 format, the inference will get value overflow. we had seen the data differene [between onnx and trt fp16, with polygraphy.] get larger and larger following those DiT blocks. My question is, how to make the whole model design more friendly to mix-precision inference? to let the DiT blocks less sensitive to value precision. Should I make the convnet pre-encoder and final blocks more complex, or more simple? Thanks