-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
Description
Hi authors, thank you for the great work on OneVision-Encoder.
I have a question about the “codec-style patch selection” mentioned in the paper/website.
From the codebase, I see that the model forward supports sparse inputs via patch_positions,
but I could not find an explicit implementation of the codec-style patch selection pipeline
(e.g., how patches are selected from raw videos).
Could you please clarify:
- Is the codec-style patch selection implemented anywhere in this repo, or are users expected
to providepatch_positionsexternally? - If it is external, do you plan to release the preprocessing/selection code (or a reference implementation)?
- What signals are used to estimate patch importance during training (e.g., motion, residuals, saliency)?
I am trying to better understand and possibly reproduce the codec-style pipeline.
Thanks for any clarification!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels