How to design network with DiT blocks that are friendly to Tensorrt fp16 conversion?

We had a network that structed as `a convnet pre-encoder -> DiT blocks  -> final block for last sampling`,  it worked well with torch format and  onnx format, but when we tried to convert it into tensorrt fp16 format, the inference will get value overflow.  we had seen the data differene [between onnx and trt fp16, with polygraphy.] get larger and larger following those DiT blocks.  My question is, how to make the whole model design more friendly to mix-precision inference? to let the DiT blocks less sensitive to value precision. Should I make the convnet pre-encoder and final blocks more complex, or more simple?  Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to design network with DiT blocks that are friendly to Tensorrt fp16 conversion? #12638

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to design network with DiT blocks that are friendly to Tensorrt fp16 conversion? #12638

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions