Skip to content

How to design network with DiT blocks that are friendly to Tensorrt fp16 conversion? #12638

@JohnHerry

Description

@JohnHerry

We had a network that structed as a convnet pre-encoder -> DiT blocks -> final block for last sampling, it worked well with torch format and onnx format, but when we tried to convert it into tensorrt fp16 format, the inference will get value overflow. we had seen the data differene [between onnx and trt fp16, with polygraphy.] get larger and larger following those DiT blocks. My question is, how to make the whole model design more friendly to mix-precision inference? to let the DiT blocks less sensitive to value precision. Should I make the convnet pre-encoder and final blocks more complex, or more simple? Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions